Since its launch Retraction Watch has done much to highlight the value of research integrity and publishing standards. Discussing the recent acquisition by Crossref of Retraction Watch’s database of retracted articles, Ivan Oransky and Rachael Lammey highlight the value of this data and the difficulties of making it openly and sustainably accessible.
By convention and, some would say, by necessity, publishers are considered the stewards of the scientific record. A key part of the value they say they provide is the production, curation, and dissemination of metadata so that the literature can be discovered, read, shared, and tracked.
But publishers are not carrying their weight when it comes to at least one key part of that record: Retractions.
Studies have demonstrated that central databases that should include information about all retracted papers covered by a particular set of criteria are missing many, and sometimes most, retractions. The differences between databases are stark: Some contain only a third or a quarter of the at least 50,000 known retractions in the literature. Whether intentionally or simply not as a matter of priority, publishers are not transmitting the metadata they should be.
That means researchers who are trying to avoid citing or relying on retracted papers are stuck. While they could check each reference by hand on publishers’ sites, studies have shown that not even that will catch all retractions. It also means that the full impact of shoddy science – at least the work that is retracted, a fraction of what should be – will not be understood, nor available for study by other scholars.
Studies have demonstrated that central databases that should include information about all retracted papers covered by a particular set of criteria are missing many, and sometimes most, retractions.
Thankfully, there is a solution. Earlier this month, Crossref announced that it had acquired the Retraction Watch Database, launched in 2018 and curated by The Center For Scientific Integrity – the parent non-profit of Retraction Watch – and containing about 43,000 retractions. Crossref works with over 19,000 members from 151 countries including institutions, funders, publishers, preprint servers, libraries, and more, collecting metadata on research objects and making it openly available via an API that sees over 1.1 billion queries each month.
The more comprehensive and accurate the metadata, the more value it provides to the community, providing an open source of information on things like citations, research funding, related research data and preprints. Crossref has encouraged the greater reporting of this important information by removing the fee related to registering information on retractions and other updates in 2020, and also flags retraction information (via the Crossmark service) as one of 12 key metadata elements that its members should register in its Participation Reports.
But information on retractions in the metadata registered by Crossref members has not been comprehensive. Compared to the 43,000 retractions in the Retraction Watch database, Crossref could only see, and make available data on around 14,000 retractions up until September 2023. This created problems for the community: without more complete data on retractions, they couldn’t use Crossref metadata as a source of this information to build downstream tools, services, to do research on retractions, or to know what to trust when doing their own research. Incomplete data meant that the community risked seeing false positives and may have assumed a piece of work was not retracted when it had been, but the information hadn’t been communicated downstream. This means that retracted research can inadvertently spread throughout the literature, proliferating errors or wasting valuable research time.
The more comprehensive and accurate the metadata, the more value it provides to the community
The acquisition accomplishes two critical goals: Making the data covering nearly 50,000 retractions freely available, and providing robust financial support for its work to continue. These two goals have been intertwined since Retraction Watch launched the database five years ago, and a bit of history is worth reviewing as we hope it is useful to others engaged in this kind of work.
Gathering and curating a comprehensive database of retractions, it turns out, takes resources. Those resources initially came from grants from three generous foundations. But as is often the case, either because of shifting priorities, the design of certain funding, and philanthropy’s understandable desire to see non-profits like The Center for Scientific Integrity – the parent non-profit of Retraction Watch – stand on their own feet, sustainable funding required a different model.
That model relied heavily on licensing the database to organizations that could make use of it in products or for internal purposes. It meant, however, that the data were not freely available. The Crossref agreement makes the data fully open, while providing The Center for Scientific Integrity with more than $800,000USD in funding over the initial term, which is the next five years.
Given how much effort is wasted when research projects are built on what turn out to be houses of sand, we are confident that this will save costs immediately.
Anyone – creators of citation software, publishers, universities, scholars – can now download the data and use it as they see fit. Unlike retraction data available through other sources, the Retraction Watch Database also includes a detailed taxonomy of reasons for every retraction. The only requirement is attribution. Given how much effort is wasted when research projects are built on what turn out to be houses of sand, we are confident that this will save costs immediately.
There remains work to do. For example, both Crossref and Retraction Watch participate in the NISO CREC Working Group which will soon publish a recommended practice on the Communication of Retractions, Removals, and Expressions of Concern. With this acquisition, we have identified a way to greatly increase the openly available information on retractions, supplementing Crossref member data and helping sustain another not-for-profit working hard to provide this. This in turn helps the community benefit from and rely upon more comprehensive information on important updates that have been applied to research after it was published or made available online. By collaborating, we hope to support more building blocks and fewer houses of sand for the research community to use.
The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.
Image Credit: Adapted from Shubham Dhage via Unsplash.