Joint ICSU Press/UNESCO Expert Conference on ELECTRONIC PUBLISHING IN SCIENCE

UNESCO, Paris, 19-23 February 1996

A scientist's view of the issues and challenges

Sydney R Hall, Crystallography Centre,
University of Western Australia, AUSTRALIA
Contact Info


Summary

Perspectives
Data Validation
Information growth
Electronic Data Standards
Existing Electronic Journals
A Learned Society Approach
IUCr Delivery Modes
Conclusions
References and Further Information

This meeting has been organised, primarily, because the rapidly increasing power of electronic communications and computer networks is challenging traditional methods of scientific publishing in a way similar to the encroachment of television on newspapers in the 1960's. We are here specifically to discuss, and to understand better, the changes and the effects that future electronic delivery systems will have on science and on scientific publishers.

We should not overlook, however, that scientific journals are only a small, albeit important, part of information dissemination in our society. Changes to electronic communications are going to profoundly affect almost every aspect of our lives, from entertainment to everyday household transactions. Networked desktop delivery systems are already pervasive and will soon be, and in some countries already are, as common as the telephone....and indeed are in competition with this very medium. We must recognise therefore that our attempts to harness these developments are only a pixel of a much larger picture. Of course, as an optimist, I want to believe that if any group can guide this technology for its own purposes, it should be the scientists who largely gave it birth!

Perspectives

My contribution to these discussions will be to put forward both a scientist's and an editor's view of desirable outcomes from electronic publishing. In doing this I would also like to relate some of the electronic publishing efforts made by one learned society. My view of this topic is from four somewhat different perspectives.

Let me first quash any expectations that I will attempt to put forward some general guidelines for electronic publishing in science! This is unlikely because, like most scientists, I view the role of journals very much in terms of the requirements of my discipline, which is crystallography. This is a field in which particular value is placed on the integrity and accessibility of measured information, and a need for careful validation before data may be placed in the public domain. I state my bias now to emphasise the belief that the optimal approach for e- journals and delivery systems is determined by the nature of the published material. Some of the approaches I will describe may be applicable in a broader role; many will not. I don't know of, and have not really looked for, an optimal path through the wilderness of publication choices that will work as a general approach. In fact I am highly suspicious of "global solutions" because they run counter to one of the most powerful attributes of this technology, its flexibility. Indeed there are very strong scientific and economic reasons for customising electronic publications to each discipline. In other words, electronic publication tailored to the exchange of concepts and theory are unlikely to be appropriate to the dissemination of measured data, which has no intrinsic "reason"! Similarly, the publishing requirements for, say, the atmospheric or earth sciences is quite likely to be different to that of, say, psychology or sociology.

It is interesting to note that this flexibility is why most scientists seem to favour the concept of electronic journals. They understand that these changes pose significant challenges and uncertainties for existing scientific publishers but recognise that better communications usually means better science. Most expect that electronification will enhance existing information services, or, at least, be complementary to them. Few scientists I know of view e-journals as "the juggernaut" that will demolish hardcopy overnight. There is a strong feeling, which I hope is not too naive, that the evolution of e-journals will be controlled largely by the readers' capacity to cope with change.

Data Validation

Let me fill in the background to these opening remarks. I want to concentrate for the moment on one aspect of information gathering and distribution; namely, data bases. These have a particular relevance to the future role of electronification in some fields. For the past three hundred years or so journals and texts have been primary vehicles for scientific knowledge and information, with editors and publishers acting as arbiters of integrity, and libraries being the key holders to data access. The author-editor-journal-library-reader chain remained essentially unbroken over this long period because the roles of the component parts were symbiotic.

The first break in this sequence occurred for some disciplines about thirty years ago. It occurred at a time when the cost and capacity of computers reached a point when they were suitable as permanent depositories for large-scale data resources. What started out as local archives quickly grew into national and international data bases catering to a range of disciplines. These resources offered data access on a scale that individual publishers or libraries could not match. Did that mean that the author- journal-reader nexus had been broken? For many data bases, perhaps most, the answer at present is "No". The primary source of data remains the published material. The reason for this is simple. Data base organisations, such those in the structural sciences, see publications as a convenient way of validating information, and database users expect that this type of checking has taken place.

This suggests that the need for reviewing is currently entrenched in the information gathering processes, but it also poses some important questions for the future. Will the need for peer-review remain if e-journals become the norm? There are related questions: what if a data base organisation is able check its speciality data as carefully as a journal review can....or even better? Or, what if widely-available validation tools enable quality checks to be made in the laboratory? And what if these checks are good enough so that data can be deposited directly into the data base?

Of course, these activities already exist, and will increase with electronification provided appropriate acknowledgment can be given to the depositor, and this leads to the question: is this is a good direction for science to take? If the answer is "No", is there any way to prevent it?!

Information Growth

The driving force for these developments is an increasing capacity for data acquisition. The growth of the Protein Data Bank at the Brookhaven National Laboratories in New York [1] is typical of what's happening to all data bases. I have selected the PDB because it belongs to a relatively new discipline (ie. the study of 3D macromolecular structure). Its growth over the past five years has been exponential. The current deposition rate is about 100 macromolecular data sets per month and by the turn of the century this expected to be 1000 per month!

This sort of incredible growth is fuelled by the same technology spawning electronic publication. The ability of scientists to generate more information, faster, is particularly true in the computationally-intensive sciences...and which discipline does not fall into this category these days? To some extent the earliest pressures of increased productivity have been borne by the data bases, but journals are already feeling the strain. For example, recent improvements to area-detectors have meant that X-ray diffraction data can be measured 10 times faster than several years ago! When this increased measurement capacity is translated into new crystal structure studies and papers, will structural science journals be able to cope using the existing editorial and publication processes. The probable answer is "No".

Of real concern is that this growth is only the warning tremor of an information "quake" that will happen with the advent of "gigaflop" desktop computers, and the installation of global networks with "gigabit" bandwidth's. Future network facilities will provide scientists with instant video-quality communication to anywhere in the world, and the capacity to measure and process data, and to prepare publication material, at a rate which is incomprehensible to us at the moment. Although the pre-shocks of the information big-bang have been evident for some time, few learned societies appear to have made a serious effort to anticipate the changes needed to upgrade, or to at least minimise damage to, existing dissemination services.

As a scientist working on data handling processes I am excited and stimulated by these developments; as an editor I admit to being somewhat intimidated by them. I would like to relate how one learned society publisher, the International Union for Crystallography (IUCr) [2], is coping with these challenges. I present this overview not so much as an editor of one of the IUCr journals, but as a scientist that was closely involved in the expert group set up by the IUCr to recommend publication directions that would benefit the discipline as a whole.

Electronic Data Standards

Crystallography is a generic term used for studies involving crystalline material - it includes disciplines ranging from solid-state physics to drug design; all with a primary interest in 3D structure at atomic resolution. The IUCr has for almost a decade planned to direct its six major journals, reference tables and monographs into electronic publishing. Because crystallographic studies are computer-intensive and usually involve large volumes of data, it recognised that a standard for the global interchange of electronic data was needed as a prerequisite to electronic data handling and publication processes. The crystallographic information file (CIF) [3] format based on the STAR File [4] was adopted for this purpose. This is a flat ASCII file which is human- readable, self-identifiable, extensible and has simple syntax rules. The CIF format is now widely accepted as an interchange standard in structural chemistry, and is the preferred entry approach for a number of international data bases and publishers.

The IUCr is by no means alone in recognising the importance of data standards. A number of other scientific disciplines have put forward standard interchange formats. The Chemical Abstracts Service (CAS) has developed CXF based on the ASN.1 protocols [5], and the spectroscopic community have been using JCAMP-DX [6]. An exchange approach based on the portable access software NetCDF [7], is used in the atmospheric sciences. While data interchange standards are important in their own right, a primary motivation of bodies such as the IUCr and ACS is their application be available for publication submission, and even for data delivery purposes.

Existing Electronic Journals

Other publishing organisations not involved in data interchange approaches are, nevertheless, critically aware of the importance of portable electronic documents [8]. Indeed many publishers have advanced document delivery systems. The Association of Research Libraries [9] currently (July 1995) lists 484 electronic journals, of which about 25% are peer-reviewed [10]. Only a small proportion of the latter are, however, science publications and fewer are in mainstream science disciplines.

From the publisher's point of view, the reasons for this are obvious. Electronic publishing is dependent on fast- evolving technologies, uncertain readership response and access, and involves untried cost-recovery processes [11]. It follows that journals pioneering electronic delivery systems may have much to gain, but they also have a lot to lose. Some publishing efforts will fail because of wrong delivery choices, other journals will fail because of tardiness to provide these services. It is also quite likely that journal allegiance will become a concept of the past. Reader and author mobility is more likely be influenced by the range of services provided (abstracting, publication and archival), by more competitive charging structures, and because the nature of the delivery service (and multimedia tools supplied) will exert a greater influence on the reader's choice [12]. It is entirely possible that existing archival services, such as value-added abstracting and data bases, will be overtaken by the new dissemination methods.

The awareness of these possibilities and scenarios is important for any organisation charting a course into electronic publishing waters. It is entirely possible that ignoring any one of them could sink the sturdiest publishing ship! In the shadow of these dire warnings, let me outline the course that the IUCr has taken.

A Learned Society Approach

The IUCr, as a learned society publisher, recognised at the outset that electronic networked communications would impact on every aspect of the publication process; technical, editorial, peer-review and copyright. In 1987 it set up a special Electronic Publishing Committee (EPC) which was independent of the normal publishing activities coordinated by the Commission on Journals. The EPC has the responsibility of recommending and overseeing developments that will guide IUCr publications towards appropriate electronification.

Its first step was to provide for data exchange standards, and 1988 the EPC assigned a working party to develop the CIF format and an associated data dictionary. This was completed in late 1989 and tabled before the IUCr General Assembly in 1990 as the blueprint for future data interchange. In the next two years a major effort was made to encourage developers of crystallographic software packages to generate data in CIF format. Coordination of this was the responsibility of the Committee for the Maintenance of the CIF Standards, COMCIFS. The primary role of COMCIFS is to ensure that software and data definitions conform to the CIF and STAR syntax. An interesting spin-off from the electronic CIF data dictionaries used for automatic data validation, is the dictionary definition language, DDL [15]. This has become an informatics research topic in its own right. Since 1991 CIF dictionaries have been developed by special working groups in the fields of powder diffraction, macromolecular structure, incommensurate structure, symmetry notation and NMR data. CIF dictionaries are planned so that eventually all aspects of electronic crystallographic data can be validated automatically.

It was decided that Acta Crystallographica Section C would be the IUCr's initial "e-journal" because of its well-defined modular format. Acta C started accepting electronic manuscripts in CIF format in 1991. This has been so successful that from the beginning of 1996 Acta C will accept only CIF submissions! Each CIF usually contains the complete set of material needed for publication: the text, the tabular data, and the diagrams in PostScript or HPGL. Authors submit these CIFs directly to the Acta office, usually via e-mail, where they are processed automatically. The file contents are validated, special software is used to check the integrity and self-consistency of the contained data, and the text items are converted into printed proof of the paper. The proof includes formatted tables of data which are extracted directly from the CIF entries. Manual intervention only occurs if an error is detected. If too many problems are encountered the paper is returned immediately to the author. All of this happens before the paper is forwarded to the Co-editor for scientific review.

The previewing of electronic submissions has become so important to the overall efficiency of the Section C editorial process that automatic e-mail servers [13] are now provided by the Acta office so that authors can check their CIF prior to submission. The servers automatically return error reports on the data, and a PostScript image of the text. This has greatly reduced the time wasted on trivial errors. The availability of CIF generating and checking software has led to other non-IUCr journals accepting CIF submissions as well. The advantages of involving authors closely with the initial checking process is obvious: it makes electronic submissions faster, less error-prone and is much more cost-effective.

Co-editors administer the scientific review of Section C papers. They receive from the Acta office the printed text in proof format, the check report and FTP access to the submitted CIF. The CIF is important to the review process because software exists by which the referee can, in a matter of seconds, view, manipulate and check the structural data on a computer screen. Correspondence with the referees and authors is normally by e-mail and often involves the exchange of corrected CIF data. If no significant problems are encountered in the review the Co-editor can "fast track" a manuscript so that the CIF- generated proof becomes the final proof. This means that the better papers are published faster, and this is a further enticement for authors to pre-check their submissions. The final step for an accepted paper is that the CIF is updated and archived at the Acta office.

The CIF archive is a crucial part of the delivery strategy of Acta Crystallographica Section C. Because a CIF contains much more information than the published paper, and it is electronic, it represents a data resource for use in publication delivery modes which require primary data. I'll refer to this again later. And because CIFs also contain data which are currently not stored or distributed by data bases, they will provide an irreplaceable depository for future data mining.

IUCr Delivery Modes

Let me now quickly review the IUCr's efforts with electronic publication delivery. It has moved somewhat cautiously in offering these facilities for several reasons. First, it is uncertain about the most cost effective approach at this stage, and this is important for keeping subscriptions to a minimum and maintaining continuity of service to the community. Second, IUCr periodicals are peer-reviewed and there is no intention to change this in the foreseeable future. Experience with electronic submissions over the past 5 years has shown that even for Section C papers, in which the CIF data goes literally from the laboratory computer to the printed page, scientific refereeing plays an essential role in data validation and achieving publication standards. It follows that papers with less structured results and formats, such as those in other IUCr journals, will continue to need close scrutiny even if they are also submitted electronically.

The "peer-review" step is pivotal in deciding on delivery options for a journal. In the structural sciences, enormous value is placed on the precision and reliability of data and the lack of review or validation procedures would almost certainly reduce the intrinsic value of papers in this field to the point where the only electronic dissemination option might be a bulletin board. Probable consequences of this are the lack of funds for supporting long-term publication archives, or for their free access to readers. The way in which manuscripts are submitted also influences electronic delivery. For example, if the primary data of a paper is submitted only as hardcopy, this limits subsequent electronic archive modes, and can severely curtail a detailed review of the data.

As a relatively small learned society, the IUCr is able to map its publishing and organisational services closely to those expected by scientists in the discipline. Accordingly, its initial electronic publication delivery modes are intentionally simple and involve no charges. For example, the Contents pages of each journal are currently placed on the World Wide Web [14] on the day of issue. These presentations contain the title, authors and a synopsis. For Section C, there are additional URL's which permit the chemical diagram and CIF to be down loaded and displayed. As a future option, full papers will be available over the internet, probably as a charged service. Trials are also underway to produce a CD-ROM of the annual set of issues for the journals, and it is planned that existing reference volumes will also be available as CD-ROM versions.

For the present, all of these developments complement rather than replace existing hardcopy publications. There are reasons why this will probably remain the case for some time.

For all these reasons, it appears to be a consensus view of crystallographers that most hardcopy journals will survive this generation of scientists, and perhaps even the next! This is despite decreasing library budgets, increasing printing costs and the claims of cyberspace experts that hardcopy is an anachronism! For most bench-scientists the paper journal is still the most compact, convenient and portable information container, and they still prefer to scan a page than a screen.

While this is certainly the common view of my colleagues, those of us involved in these developments realise, along with most participants at this meeting, that electrons will eventually win over printers ink. The opportunities for cost savings and value-adding will make sure of that! However, I am certain that the transition will be much longer than many in this audience think it will be.

Conclusions

In closing I would like to briefly summarise the main points of this talk.

References and Further Information

1 PDB Web site

2 IUCr Web site

3 Hall, S.R., Allen, F.H. & Brown, I.D. (1991) Acta Cryst. A47, 655-685.
IUCr CIF info
UWA CIF info

4 Hall, S.R. (1991) J. Chem. Inform. Comp. Sci., 31, 326- 333.
Hall, S.R. & Spadaccini, N. (1994) J. Chem. Inform. Comp. Sci., 34, 505-508.
UWA STAR File info

5 Abstract Syntax Notation.1 - ISO 8224: presentation syntax, and ISO 8825: basic encoding rules.

6 McDonald, R.S. & Wilks, P.A. (1988) Applied Spectroscopy, 42(1), 151-162.

7 Rew, R.K. & Davis, G.P. (1990) Comp. Graph. Appl., IEEE, July 1990, 76-82.

8 Borman, S. (1993) C&EN, June 10-23., 1993

9Electronic journal list

10 Stedman, G. & Wybourne, B. (1993) Physics World, Feb 1993, 19-20.

11 Pearce, A. (1993) Learned Publishing, 6(4), 13-16.

12 Stix, G. (1994) Sci. Amer., Dec 1994, 72-77.

13 checkcif@iucr.ac.uk and printcif@iucr.ac.uk

14 IUCr journals Table of Contents

15 Hall, S.R. & Cook, A.P.F. (1995) J. Chem. Inform. Comp. Sci., 35, 819-825.
IUCr DDL info
Rutger's Univ. DDL2 info


Last updated : April 03 1996 Copyright © 1995-1996
ICSU Press and individual authors. All rights reserved

Listing By Title

Listing by Author's Name, in alphabetical order

Return to the ICSU Press/UNESCO Conference Programme Homepage


University of Illinois at Urbana-Champaign
The Library of the University of Illinois at Urbana-Champaign
Comments to: Tim Cole
06.24.97 RL