2016 showed progress in providing access to taxonomic data in near realm time. But much more is needed.
The current way of publishing, cataloguing and providing access to taxonomic data, to say it mildly, is an almost complete failure.
After having worked for over 14 years in modeling and extracting taxonomic content from publications, we (at Plazi) don't see any silver lining that we can deal with the huge backlog of literature, and a slight for ongoing publishing.
With a great effort we now can automatically extract content from scientific publications, that results together with those easily imported from taxpub/XML published articles from Pensoft with an estimate of 25% of the new described species for 2016, including the metadata, the taxonomic treatments, the illustrations and in many case the types material, including the collection code and specimen code. Today we have over 60,000 tagged images on BLR, complementing the ca 30,000 images on Flickr through BHL.
We now have for the articles, the treatments, and illustrations respective metadata added, and persistent identifiers, that are also included in the metadata them whenever one cites another.
This workflow is mainly working for born digital articles. Tackling at a same level scanned articles is a magnitude more complex, which makes it even less hopeful that it will be done somewhere in the near future.
For 2016, at Plazi we extracted 4 new families, 376 new genera, 4.684 new species, 42.207 taxonomic treatments of 40.870 unique names from 60 different journals. The data is accessible at Plazi and the Biodiversity Literature Repository.
Plazi data is automatically imported in GBIF where it is one of the major name contributors and one of the few providing treatments, allowing linking a name usage to the respective treatment and from there to the original article and illustrations - which for a nomenclatural point of view allows to check, besides the exact publishing date, all what is needed to understand whether a name is available. But it also allows to start to understand the scientific bases for new names, which is all too often very thin, i.e. one single specimen based descriptions (see eg Miller et al., 2016) or consult the new taxa feature on Plazi.
Our taxonomists' chance is that we have one of the most advanced publication system for the entire scientific publishing world available. In fact it has been developed for the taxonomic world thanks to a collaboration with Pensoft who implemented it, and a collaboration with Plazi and the US National Institutes of Heath, which for this reason also allowed to include taxonomic articles into PubMed Central.
The above approach has another advantage, that all is open access and thus available for anybody anywhere in the world. In fact the implementation of the Open Biodiversity Knowledge Management System (OBKMS) will make all the data that is being published at Pensoft and extracted by Plazi available into the Linked Open Data Cloud. With other words, our really important data will become a first class citizen we want it to be.
But it needs a community effort to make it happen, that is to provide not a fraction but all the data of our discoveries right at the moment we publish. Please join this effort!