About the Catalogue of Life

Introduction

Species 2000 and the Integrated Taxonomic Information System (ITIS) are involving taxonomists throughout the world in the Species 2000 & ITIS Catalogue of Life (CoL) programme, collating a uniform and validated index to the world's known species, for use as a practical tool in inventorying and monitoring biodiversity worldwide. The index can be used to provide:

  • electronic baseline species lists for use in inventorying projects worldwide;
  • an index for an Internet gateway to species databases worldwide, as provided through GBIF;
  • a reference system for comparison between inventories;
  • comprehensive worldwide catalogue for checking the status, classification and naming of species.

This is a work in progress, now covering nearly one-third of the world's species. The Catalogue is available on-line as a prototype Dynamic Checklist and also as a yearly edition (the Annual Checklist), which is available on CD-ROM as well as on the Internet. This comprehensive index of all known plants, animals, fungi and micro-organisms is being achieved by accessing a distributed array of taxonomic databases.

About 1.75 million species of plants, animals, fungi and micro-organisms are 'known' in the sense that they have been described and named by taxonomists. Existing taxonomic database projects cover about 40% of known species. Major resources are needed to establish indexes for the remaining groups. The CoL programme aims to stimulate completion of the array of taxonomic databases. It seeks resources both to support the completion of the existing databases, and to help establish new databases in different countries.

Member databases of Species 2000 and ITIS are working together to produce the CoL, and are working closely with the Global Biodiversity Information Facility (GBIF). The CoL plays a significant role as the species index in the GBIF portal (www.gbif.net).

 

More than half a million species!

This fifth release of the Annual Checklist contains a searchable database with information on the scientific names and synonyms of more than 527,000 of the world's species and 41,000 infraspecific taxa in the groups listed below. Common names and geographic distributions are given for many but not yet all of these species. The management classification for arranging all these data is provided by ITIS above the node of attachment of each database. Beneath such nodes, the classification is provided by that database.

Viruses Virus species from ICTVdB (the Universal Virus Database)

Microorganisms and algae Bacteria and Archaea from BIOS
Seaweeds and other algae incl. Cyanobacteria from AlgaeBase

Fungi 11 orders in whole or in part from CABI Species Fungorum

Animals Fishes from FishBase
Ants, birds, turtles, crocodiles, isopods, hydrozoan stony corals, and small groups of mammals, amphibians, molluscs, and crustaceans from ITIS
Scarabaeid beetles from the World Scarabaeidae Database
Longicorn beetles from TITAN
Scale insects from ScaleNet
Fleas from Parhost
Flies, craneflies, mosquitoes, bots, midges and gnats from BioSystematic Database of World Diptera, Catalogue of Craneflies of the World, CIPA and ITIS
Clothes-moths from Global Tineidae Moth Database
Crickets, grasshoppers, locusts, and katydids from the Orthoptera Species File Online
Spiders from World Spider Catalog
Krill from ETI's Euphausiids of the World Ocean database
Cephalopods (octopus, squid, cuttlefish and nautilus) from CephBase
Marine invertebrates (12 phyla, 5 classes and 4 orders) and chordates (4 classes) from the UNESCO-IOC Register of Marine Organisms
Sea anemones from Hexacorallians of the World
Snails and slugs (some groups) from Australian Faunal Directory and ITIS

Plants Mosses from Moss TROPICOS
Cycads, eucalypts and 4 other angiosperm families from the IOPI Global Plant Checklist
Legumes from the ILDIS World Database of Legumes
Seagrasses from AlgaeBase

PLUS additional species from ITIS, CABI Species Fungorum and the Australian Faunal Directory.

 

About ITIS

The Integrated Taxonomic Information System (ITIS) is a partnership of numerous organizations from the United States, Canada, and Mexico, and data stewards and experts from around the world (see http://www.itis.usda.gov, http//www.cbif.gc.ca/itis and http://siit.conabio.gob.mx). ITIS is part of the US National Biological Information Infrastructure (http://www.nbii.gov).

The ITIS database is an automated reference of scientific and common names of biota of interest to North America. ITIS places priority on North American species, but also includes global and New World treatments for many groups of biota. The database contains more than 400,000 scientific and common names for species in all kingdoms, using a standard classification (the higher levels of which are used as the management classification for the CoL database). Each scientific name is assigned a unique numerical identifier called a Taxonomic Serial Number (TSN). The ITIS database is accessible via the World Wide Web in English, French, Spanish, and Portuguese (http://itis.gbif.net).

 

About Species 2000

Species 2000 (http://www.sp2000.org) is an autonomous federation of taxonomic database custodians, involving taxonomists throughout the world in collating a uniform and validated index to the world's known species. Species 2000 is a not-for-profit company limited by guarantee (Registered in England No. 3479405) with five directors and with taxonomic database organisations from around the world as members.

Species 2000 europa, a project (http://www.sp2000europa.org) funded by the European Commission, aims to link databases in Europe as a regional resource and to provide additional information to the global Catalogue of Life.

Species 2000 Asia-Oceania (http://www.sp2000ao.nies.go.jp) works to promote databases and other taxonomic activities in that region and to provide additional information to the global CoL.

 

About the Catalogue of Life contributing databases

The Catalogue of Life is being formed by linking individual taxonomic databases to form a virtual on-line Dynamic Checklist (prototype currently online), as well as an Annual Checklist produced annually and available on CD-ROM and online. In the Dynamic Checklist information is retrieved from the individual Global Species Databases (GSDs) in real time, whilst in the Annual Checklist the standard data for each species have been extracted from the individual databases (both GSDs and regional datasets) and imported into a single database to provide a static annual version of the CoL.

Global Species Databases (GSDs) contributing to the CoL aspire to the following properties:

  • Cover one taxon worldwide
  • Contain a taxonomic checklist of all species within that taxon
  • Deal with species as taxa, and contain synonymy and taxonomic opinion
  • Have an explicit mechanism for seeking at least one responsible/consensus taxonomy, and for applying it consistently
  • Cross-index significant alternative taxonomies in their synonymy.

Other taxonomic databases are providing regional coverage for groups not yet covered globally in the Annual Checklist, or regional information linked to the global taxonomic backbone. For the Dynamic Checklist, linking software and procedures for this are currently being developed.

The data in this Species 2000 & ITIS Catalogue of Life: 2005 Annual Checklist have been provided by a range of database organisations [list of source databases]. The datasets result from collaboration and editing by many expert taxonomists, whose names are found in the datasets themselves.

The Catalogue of Life partners are keen to contact the custodians of all other GSDs covering any group of organisms worldwide, as well as major regional databases.

 

The Catalogue of Life Standard Dataset

The Catalogue of Life (CoL) delivers a standard set of data for every known species. These data are drawn from an array of participating taxonomic databases. Currently, the majority of these sources are the appropriate Global Species Databases (GSDs) - that is, databases containing worldwide coverage of all the species within one taxon. GSDs are not available for all taxa, so some sources are Regional Species Databases (RSDs). The Australian Faunal Directory, the Integrated Taxonomic Information Service (ITIS) and Species Fungorum are unusual in that they supply the Catalogue of Life with some taxonomic sectors of GSD status as well as regional datasets (RSDs) for other groups. Below, we use the name 'Source Database' for both GSD and RSD.

Species 2000 has defined ten field groups to be the standard set of data for each species (or infraspecific taxon if present).

1. Accepted Scientific Name linked to References (obligatory)
2. Synonym(s) linked to Reference(s) (obligatory, as appropriate)
3. Common Name(s) linked to Reference(s) (optional)
4. Latest taxonomic scrutiny (obligatory)
5. Source Database (obligatory)
6. Additional Data (optional)
7. Family name (obligatory)
8. Classification above family, and highest taxon (obligatory, as appropriate)
9. Distribution (optional)
10. Reference(s)

Some of the source databases additionally supply subspecies or varieties. The same dataset is used for each of these. Currently, Source Databases provide all of the obligatory field groups but not all of them provide the optional field groups.

Additional information will be available either within the appropriate Source Database, or through hyperlinks to other databases.


1. Accepted Scientific Name

The Accepted, Valid or Correct scientific name (terminology for this name varies between the Codes of Nomenclature; in the CoL we use the term 'Accepted') is that currently accepted for the species or infraspecific taxon (subspecies or variety). Two variants of NameStatus are possible in databases: 'Accepted name' or 'Provisionally accepted name'.
'Accepted name' is the name currently accepted for the species by the compiler or editor of the dataset as a quality taxonomic opinion.
'Provisionally accepted name' is the name currently accepted for the species by the dataset compiler, but with some element of taxonomic or nomenclatural doubt.

Style of author-string depends on nomenclatural traditions for different phyla.

In the case of Virus Names, the genus is placed in the Genus field, and the polynomial species name is placed in the SpecificEpithet field. Virus species names have no official author.

At least one reference is given. It may be the original (validating) publication of the taxon name or new name combination - Nomenclatural Reference (defined below) - or one or more references that accept this species in the same taxonomic status, and with the same name - Taxonomic Acceptance Reference(s).


2. Synonym(s)

The list of Synonyms can include from 0 to many species or infraspecific names, which are given a Species 2000 synonymic status (NameStatus). The three possibilities below give the information sufficient for clear synonymic indexing, but do not give the full nomenclatural details, as these differ markedly in structure and context across different phyla. It is therefore necessary to 'translate' the very varied sorts of synonymic status in the source databases to create a uniform, accurate, but broad set of synonymic links for use in the Catalogue of Life.

Category A: List of "Synonyms" - names which point unambiguously at one species
Category B: List of "Ambiguous synonyms" - names which are ambiguous because they point at the current species and one or more others e.g. homonyms, pro-parte synonyms
Category C: List of "Misapplied names" - names that have been wrongly applied to the current species, and may also be correctly applied to another species.


3. Common Name(s)

There can be 0 to many Common Names, since some species have many common names while others have none. Some contributing GSDs contain no common names. Species 2000 is adding extra common names from widely used reference lists. The language of the common name is given, as is the country in which a common name is used if known.


4. Latest taxonomic scrutiny

This cites the latest taxonomic scrutiny (name of taxonomist and date) of this species record in the source database.


5. Source Database

This information is shown as part of every record in the CoL, and is visible under the heading 'Source databases'. It is provided by the source database and includes the database name (in full and abbreviated), version and/or date of release, the taxon covered by the database, and authors, custodians and editors as appropriate.


6. Additional Data

This optional field contains free text up to 255 characters. It can contain information from one or several data fields from the source database (for example, type specimen or strain, common name of family, habit/life form, ecology, uses) as decided by the custodian of the source database. Unlike all other field groups, there is no intention to make these data compatible across taxa. It is therefore distinctive or particular to the species supplied by one database.

7. Family name

This field should contain one valid Latin name of the Family to which the Source Database believes this species belongs. If the Family is not known (e.g. genera labelled incertae sedis in taxonomic treatments) then this is stated.

8. Classification above family and highest taxon

The Catalogue of Life uses a single taxonomic classification (also called a hierarchy or tree) for management purposes - the management classification. This management classification includes taxa of five basic ranks only: Kingdom - Division (Phylum) - Class - Order - Family. Superfamily is also used for some insect groups.

The present choice is the classification provided by ITIS (http://www.itis.usda.gov; http//www.cbif.gc.ca/itis; http://siit.conabio.gob.mx). Future technical developments should make it feasible to display alternative classifications for the same species checklists.

This classification is used above the node of attachment of each database. Beneath this node, the classification provided by the GSD is used. The taxonomic rank of the highest taxon at this attachment node varies from one GSD to another (e.g. sectors of AlgaeBase are attached as phyla; ILDIS World Database of Legumes is attached as one family).

9. Distribution

This optional list of geographic records can contain from 0 to many areas. Distribution information is currently in various forms in source databases, and absent from some.

Level 4 (Basic Recording Units) of the TDWG World Geographical Scheme, Edition 2 (2001) is used by some source databases (and is recommended) for terrestrial and freshwater organisms (http://www.tdwg.org/TDWG_geo2.pdf). In future we aim to adopt a suitable standard for marine areas.

Occurrence status (native, naturalised, etc.) is not given in many source databases. If being used, the TDWG Plant Occurrence and Status Scheme (1998) is recommended as the standard for recording Occurrence Status (http://www.tdwg.org/poss_standard.html).


10. Reference(s)

References are linked to accepted scientific names, synonyms and common names. The reference type is defined as follows:

  • Nomenclatural Reference (just one reference which contains the original (validating) publication of taxon name or new name combination or
  • Taxonomic Acceptance Reference(s) (one or more bibliographic references that accept this species in the same taxonomic status, and with the same name) or
  • Common Name Reference(s) (one or more bibliographic references that contain common names).

Refer to FA Bisby & YR Roskov, Species 2000 Baseline Documents: Standard Dataset, version 3.2 (December 2004), available at http://www.sp2000.org for more detailed information.