Taxonomy on the Web - from Nature, pt. 2

http://nature.com/cgi-taf/DynaPage.taf/… 
n6884/full/417017a_fs.html

Commentary
Nature 417, 17 - 19 (02 May 2002); doi:///10.1038/417017a


  Challenges for taxonomy

H. CHARLES J. GODFRAY

H. Charles J. Godfray is at the NERC Centre for Population Biology, 
Department of Biological Sciences, Imperial College at Silwood Park, 
Ascot, Berkshire SL5 7PY, UK.


The discipline will have to reinvent itself if it is to survive and 
flourish.

Taxonomy, the classification of living things, has its origins in 
ancient Greece and in its modern form dates back nearly 250 years, to 
when Linnaeus introduced the binomial classification still used today. 
Linnaeus, of course, hugely underestimated the number of plants and 
animals on Earth. As subsequent workers began to describe more and more 
species, often in ignorance of each others' work, the resulting 
confusion and chaos threatened to destroy the whole enterprise while 
still in its infancy. In today's jargon, we might call this the first 
bioinformatics crisis. Using the tools then available, 
nineteenth-century taxonomists solved this crisis in a brilliant way 
that has served the subject well since then. They invented a complex 
set of rules that determine how a species should be named and 
associated with a type specimen; how generic and higher taxonomic 
categories should be handled; and how conflicts over the application of 
names should be resolved. All these rules revolved around publications 
in books and scientific journals, and their descendants form the 
current codes of zoological and biological nomenclature.

But today much of taxonomy is perceived to be facing a new crisis — a 
lack of prestige and resources that is crippling the continuing 
cataloguing of biodiversity. In the United Kingdom, a Parliamentary 
Select Committee is currently conducting an enquiry into the health of 
the subject for the second time in 10 years, and similar concerns are 
being expressed around the world. In this article I shall first explore 
why descriptive taxonomy is in such straits (in contrast, its sister 
subject, phylogenetic taxonomy, is flourishing). Then, after this 
essentially negative exercise, I will argue that taxonomy can prosper 
again, but only if it reinvents itself as a twenty-first-century 
information science. It needs to adopt some of the solutions that 
molecular biologists have developed to cope with the second 
bioinformatics crisis: the huge explosion of sequence, genomic, 
proteomic and other molecular data.

The problem
  Why can't descriptive taxonomy attract large-scale funds in the same 
way as other big programmes like the Human Genome Project or the Sloan 
Digital Sky Survey? All three projects are enabling science: not in 
themselves generating new ideas or testing hypotheses, but allowing 
many new areas of research to be opened up.

One reason is that taxonomists lack clearly achievable goals that are 
both realistic and relevant. Of course it would be great to describe 
every species of organism on Earth, but we are still monumentally 
uncertain as to how many species there are (probably somewhere between 
4 million and 10 million); this goal is just not realistic at present. 
There are various projects aimed at listing, for example, all the valid 
described species of animal in Europe, or butterflies on Earth (see Box 
1). These aims are eminently achievable and very worthwhile, but the 
results are like raw, unannotated DNA sequences: unexciting and of 
relatively little value in themselves to non-specialists. Taxonomists 
need to agree on deliverable projects that will receive wide support 
across the biological and environmental sciences, and attract public 
interest.

A second problem is part of the legacy of more than 200 years of 
systematics. Many taxonomists spend most of their career trying to 
interpret the work of nineteenth-century systematicists: deconstructing 
their often inadequate published descriptions, or scouring the world's 
museums for type material that is often in very poor condition. A 
depressing fraction of published systematic research concerns these 
issues. In some taxonomic groups the past acts as a dead weight on the 
subject, the complex synonymy and scattered type material deterring 
anyone from attempting a modern revision. As Frank-Thorsten Krell 
pointed out in Correspondence (Nature 415, 957; 2002), "original 
descriptions have to be referred to for ever, independent of the 
paper's quality".

The problems do not always lie in the past. Even today, many species 
are being described poorly in isolated publications, with no attempt to 
relate a new taxon to existing species and classifications. Many of 
these 'new' species will have been described before, so sorting out the 
mess will be the headache of the next generation of taxonomists. It is 
not surprising if funding bodies view much of what taxonomists do as 
poor value for money.

One of the astonishing things about being a scientist at this 
particular time in history is the vast amount of information that is 
available, essentially free, via one's desktop computer. I can download 
the sequences of millions of genes, the positions of countless stars. 
Yet, with a few wonderful exceptions, the quantity of taxonomic 
information available on the web is pitiful, and what is present 
(typically simple lists) is of little use to non-taxonomists. But 
surely taxonomy is made for the web: it is an information-rich subject, 
often requiring copious illustrations. At present, the output of much 
taxonomy is expensive printed monographs, or papers in low-circulation 
journals available only in specialized libraries. These are not 
attractive 'deliverables' for major research funders.

Two models of taxonomy
  The taxonomy of a group of organisms does not reside in a single 
publication or a single institution, but instead is an ill-defined 
integral of the accumulated literature on that group. The literature is 
bound together and cross-references itself using the venerable rules of 
taxonomy encapsulated in the codes. But this is not the only way to 
organize a taxonomy. The taxonomy of a particular group could reside in 
one place and be administered by a single organization. It could be 
self-contained and require reference to no other sources.

My main argument is that to address the problems outlined above, and 
for taxonomy to flourish now and in the future, it has to move from the 
first to the second model: from having a distributed to a unitary 
organization. Such a massive task could only be accomplished group by 
group, as resources became available. I believe a number of things 
would then follow. First, the only logical way to organize a unitary 
taxonomy and to make it widely available is on the web. The web is 
currently used, if used at all, as an adjunct to the distributed, 
printed taxonomy, but I think it should replace it. Second, the core of 
taxonomy is a description of each species and a means of distinguishing 
among them; to this core has been added the exercise of resolving their 
evolutionary relationships. I believe that taxonomy needs to expand to 
include other aspects of the species' biology, to become an information 
science that curates our accumulated knowledge of that species in the 
way a gene annotation in a genome database organizes our knowledge of a 
particular protein. Third, I think it is essential that the unitary 
taxonomy of different groups evolves from the present taxonomy. We must 
preserve the achievements of 250 years of distributed taxonomy, 
dispensing with the bad legacy of the past but retaining the good.

To illustrate how this could be done I shall sketch one possible way a 
unitary taxonomy might be achieved. I am not a professional taxonomist 
and am under no illusion that what follows will be the best or even a 
viable model, but I hope it will bring out the issues involved.

A unitary taxonomy
  Introduce as a formal taxonomic procedure the 'first web revision'. 
This would be a revision of a major group of organisms to a standard 
decided on by the International Commission on Zoological Nomenclature, 
or the International Botanical Congress, or equivalent body (let's just 
call it the international committee). The revision would include a 
traditional description of each taxon and the location of type 
material. It might also include material not currently required in a 
formal description, for example keys and, for many groups, photographs 
or other illustrations. For some organisms a gene sequence might be 
required. It would also include a treatment of existing known synonyms 
to preserve contact with the older literature. This draft first web 
revision would be placed on the web for comments from the community, 
then after changes have been made in response, it would become the 
unitary taxonomy of the group.

What would this mean? First, from this time onwards all future work on 
the group need refer only to the set of species in the first web 
revision and then later to those in the 'nth (that is, current) web 
revision'. The taxonomy of the group is thus at a stroke liberated from 
nineteenth-century descriptions and potentially undiscovered synonyms. 
If I think I have discovered a new species I need only to check that it 
is not already in the web revision. So what happens if I describe a new 
species and then someone discovers that Linnaeus or someone had already 
described it in an overlooked work? Well, that interesting nugget of 
historical information can be added to the species' web page, but the 
name doesn't change. What happens if I want to lump, split or add 
species, or revise their higher classification? Then I submit a 
revision that is mounted on the web for refereeing and comment. If, as 
a result, it is accepted, it becomes incorporated into the current 
(n+1th) web revision. At any one time there is just a single current 
web revision to which people refer, linked to all previous revisions 
(which are maintained on the web, so that in future I can easily see 
what was understood by species x in year y).

A major difference between this way of doing taxonomy and the status 
quo is that a unitary taxonomy needs administration: both the physical 
implementation on servers and networks, and the intellectual 
administration of the current web revision. One virtue of the present 
system is that if no one is interested in a group's taxonomy it can 
quietly slumber in the library. But the collections and type material 
that underpin distributed taxonomies do require administration, which 
is currently undertaken by our great museums and herbaria. Nearly all 
these organizations are enthusiastically embracing modern web 
technologies. Hosting web revisions is something I see as a logical 
extension of their moves towards becoming, in part, modern information 
storehouses. It is absolutely clear, however, that they need more money 
in order to do this. They might also undertake the intellectual 
administration of the web revision — the refereeing and editing — 
although they would probably devolve this to committees drawn from a 
wider constituency (the equivalent of a journal's editorial board).

However it worked, standards would need to be set and monitored by the 
international committee, who would also determine which institute 
houses which taxonomy, and would prevent duplication of effort.

Advantages
  I believe that what I have described is evolutionary rather than 
revolutionary in that it preserves the hard-won successes of current 
taxonomy while dispensing with the historical baggage. It is also 
evolutionary in that groups would move to the new unitary taxonomy as 
resources became available. It would set a series of achievable targets 
that could be used to spur major funding initiatives, for example the 
first web revision of mosquitoes, reptiles or plants (and I hope Nature 
or Science might celebrate these milestones as they do completed genome 
sequences).

I believe that major government and private research funders would 
consider construction and maintenance of a unitary taxonomy — 
universally accessible, and the foundation of all future work on the 
group — much more attractive to support than taxonomy as presently 
practised. It might also attract new sources of funding. It surely 
isn't impossible that a major company might sponsor the web revision 
of, say, the Lepidoptera (butterflies and moths); and if it wants to 
put its logo on the site, then why not?

The web revision would become an information hub, both through its 
contents and through its links to other sites. Links to molecular 
databases will facilitate the increasing usefulness of molecular 
techniques in species identification. There are already exciting 
web-based phylogenetic projects (see Box 1) that aim ultimately to 
build a phylogeny of all living organisms; clearly, one would build in 
reciprocal links to these sites. Today, a reference to a species in a 
scientific article usually gives just the scientific name and possibly 
the authority, but seldom refers (or gives credit) to the taxonomic 
revision upon which the identification is based. As increasing numbers 
of journals go electronic, the mention of a species can more and more 
easily be linked to its position in the current web revision. Were the 
status of the species to change, the link would take you to the 
contemporary web revision and then forward to the current conception of 
the taxon. These links could also be used to produce a much-needed, 
fair 'citation count' for taxonomists. Finally, as an increasing amount 
of the scientific literature becomes available online through projects 
such as JSTOR (http://www.jstor.org/), one can imagine links between a 
species description and important early papers on its taxonomy and 
biology, again maintaining links with the good legacy of distributed 
taxonomy.

Many taxonomic works are very hard for non-specialists to use, 
sometimes because of real difficulties in telling many species apart, 
but more often because of the telegraphic jargon and lack of 
illustration imposed on taxonomists by the expense of publication in 
print. The web has far fewer constraints, and provides the space needed 
for taxonomists to be understood. Taxonomy often pays insufficient 
attention to its 'end users', the ecologists, conservationists, pest 
managers and amateur naturalists who need or want to identify animals 
and plants. I hope that, overlaid on the current web revision, there 
would be higher-level information, the equivalent of the regional field 
guides and floras used by field workers. For many, this 'entry level' 
would be all that is required, but where needed the user could burrow 
deeper, right through to the primary taxonomic sources. Today, few 
people would seriously think about taking a computer into the field as 
a substitute for a field guide, but that will undoubtedly change and 
taxonomists should be ready.

Finally, the taxonomy should be available free (without access charges) 
to anyone who can log onto the Internet. This will raise the profile of 
taxonomy and increase the number of people who actually use the fruits 
of taxonomic research. Longer-term positive benefits will be for a new, 
young generation of naturalists, stalking their prey using digital 
cameras, downloading their captures into PCs, then identifying them 
over the web — exposing them to taxonomy as an active discipline, at 
the heart of modern biology.

Disadvantages
  One disadvantage of a unitary taxonomy is the requirement for more 
administration, with its attendant costs. My assertion is that the 
advantages of a unitary taxonomy will prime sufficient new funds to 
counterbalance this, but if I'm wrong the project fails. There are also 
considerable technological challenges in developing the web software to 
support the taxonomies.

A possible criticism is that the proposal is top-down, at variance with 
the individualistic tradition of taxonomy. Would one clique be able to 
impose its view of how a group is classified? The international 
committee would be empowered to set standards, but rejected 
contributions to a group's taxonomy should also be stored on the web. 
Even if they are not incorporated in the current web revision they can 
at least influence future scholarship and research.

An important issue is the degree to which a treatment should be 
'complete' before it is a candidate for a first web revision. Could a 
series of intractable species complexes requiring detailed research 
delay completion of a revision? The ideal solution would be to 
commission new taxonomic research to sort out these problems, but if 
this is not possible I would favour a category of 'provisional taxon', 
where the need for further study is clearly highlighted. After all, the 
heterochromatin-rich gaps in the human genome sequence did not delay 
the announcement of its 'completion'.

Is a web-based taxonomy as permanent as a paper-based one, and are 
people without computers disenfranchised, especially those in less 
wealthy countries? I believe the first is a non-issue; there is not (as 
far as I know) a paper back-up to the human genome database, and the 
international committee would set rigid standards for archiving and 
backup. Access is a much more important matter, but very many more 
people are at present disenfranchised by their inability to get to a 
specialist library, or to order a reprint, or even by being unaware 
that certain literature exists. The web-based taxonomy must be 
completely downloadable so that even continuous access to the Internet 
is not essential, and, if all else fails, a paper copy could be 
printed. It might spread the geographical distribution of taxonomic 
activity if some sites were hosted by developing countries with 
strengths in computing, such as India.

Conclusions
  I find that the commonest reaction of taxonomists to these ideas is 
the worry that it is an attempted technological fix that distracts 
attention from what they (and I) perceive to be the overwhelmingly 
critical issue — the lack of people and resources devoted to 
descriptive taxonomy. The counter-argument is that the technological 
fix is not an end in itself; it is the means of making grassroots 
taxonomy more accessible and useful, and thus attracting people and 
funds into the field. But is such a root-and-branch change in the 
culture of taxonomy really needed? Although there is near-universal 
agreement about the current depressed state of descriptive taxonomy, 
wouldn't more funding alone solve the problem?

I think not: indeed, descriptive taxonomy might disappear completely 
for 'difficult' groups such as many insects and nematodes. Just as 
Moore's law says that microprocessor power doubles every 18 months, 
there must be a parallel law that says DNA sequencing power increases 
geometrically. In 10 or 20 years' time it will be simpler to take an 
individual organism and get enough sequence data to assign it to a 
'sequence cluster' (equivalent to species) than to key it down using 
traditional methods, let alone describe it as new. Just as bacterial 
taxonomy is now nearly all sequence-based, a new way of classifying 
insects, nematodes and perhaps even many plants and fish might evolve 
that is totally divorced from current taxonomy — a point also made 
forcibly by Robert May, president of Britain's Royal Society.

Would the death of large swathes of present-day systematics matter? Yes 
it would, because we would be throwing away so much of what we have 
learned in the past 250 years about the planet's biota, a lot of which 
we would then have to relearn. But unless taxonomy is unitary, 
web-based and able to accommodate these radical new ways of doing 
biology, I fear it will be sidelined.

The rigidity built into the current rules and codes of taxonomy — which 
include prohibition of purely electronic description — is part of their 
success, and changes should not be made lightly. But I suspect these 
rules are now a brake on progress, imprisoning the subject in outdated 
methodologies, and rendering it difficult or impossible to attract the 
major funds needed to reverse its slow decline. Surely it is time to 
experiment — time for the international taxonomic community to come 
together and countenance a unitary web revision of one or a few major 
groups of organisms (and to work out exactly how a unitary taxonomy 
should operate). This venture must be sanctioned and supported by the 
existing international committees, or no serious taxonomist will waste 
his or her time on it; no institution will administer it; and no agency 
will fund it. If successful, it will change how taxonomy is done for 
ever; if it fails it would not be difficult to revert to the status quo 
ante. There is everything to gain and little to lose.


Acknowledgements. I am grateful to the many taxonomists and other 
biologists who have debated these issues with me.
----------------------------------------------------------------
Box 1:

http://nature.com/nature/journal/…

Taxonomy on the web

The current codes of zoological and botanical nomenclature do not allow 
original descriptions to be made purely on the web, but nevertheless 
there is a substantial amount of taxonomy on the Internet. The Natural 
History Portal of the Natural History Museum in London 
(http://www.nhm.ac.uk/portal/index.html) provides an excellent entry 
into these resources, which include such sites as the International 
Plant Name Index (http://www.ipni.org/) that covers all higher plants; 
the ant database (http://www.antbase.org/) featured recently in Nature's 
News section (416, 115; 2002); and the Tree of Life project 
(http://tolweb.org/tree/), a database of phylogenies.

The most common data available are catalogues of species names and 
lists of museum specimens, although some identification keys and other 
information-rich sites are becoming available.

An ambitious project led by Species 2000 (http://www.sp2000.org/) and 
the Integrated Taxonomic Information System (http://www.itis.usda.gov/) 
aims to catalogue the world's biota, and these sites themselves also 
link to the Global Biodiversity Information Facility 
(http://www.gbif.org/), intended to be a general clearing house for 
biodiversity information.

Finally, the All Species Foundation (http://www.all-species.org/) has 
set itself the goal of making an inventory of all species on Earth in 
the next 25 years.
----------------------------------------------------------------

  © 2002 Nature Publishing Group