Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What percent of scholarly publication DOIs are registered with Crossref? #3

Closed
dhimmel opened this issue Jun 16, 2017 · 4 comments
Closed

Comments

@dhimmel
Copy link
Collaborator

dhimmel commented Jun 16, 2017

Let's use this issue to jot down notes related to what percent of scholarly publication DOIs are registered with Crossref.

As a refresher, there are many DOI Registration Agencies (RA). For example, EIDR is a DOI RA for entertainment, so you can actually get DOI metadata for some porns! There is even discussion regarding DOIs for construction products. Shoutout to @jenniferlin15 who helped me understand these intricacies of the DOI system.

For our analyses, we're most interested in DOIs for scholarly content. There are other RAs than Crossref that engage with scholarly content. Some examples are:

  1. mEDRA which "provides DOI registration services to publishers, academic institutions, research centres and intermediaries in Italy, in the EU market and internationally."
  2. DataCite which "provides persistent identifiers (DOIs) for research data."

We're mostly interested in cataloging all DOIs for scholarly publications in relation to our Sci-Hub coverage project.

@dhimmel
Copy link
Collaborator Author

dhimmel commented Jun 16, 2017

DOI Links on Wikipedia

The 2016 study, "DOI Links on Wikipedia", provides the following discussion of the DOI RA breakdown:

A prefix is assigned to a particular DOI registrant, such as publishing companies or academic societies. DOI registrants assign suffixes to their contents and register DOIs through DOI Registration Agencies (RAs). There are 10 RAs.

Some RAs that handle scholarly resources (such as journal articles, books, and datasets) are CrossRef, JaLC, ISTIC, and DataCite. JaLC is the only RA in Japan, ISTIC is a Chinese RA, and DataCite is an RA for research data. As of April 2016, there are 76,944,396 DOIs registered by CrossRef (CrossRef DOIs); 23,422,068 DOIs by ISTIC (ISTIC DOIs); 6,614,478 DOIs by DataCite (DataCite DOIs); and 1,401,144 DOIs by JaLC (JaLC DOIs).

Table 2 of this study shows the breakdown of RAs for DOI links on Wikipedia as of March 2015. This Table is reproduced below:

RA enwiki jawiki zhwiki
AIRITI 2 0 0
CrossRef 1,463,052 27,900 36,202
DataCite 464 13 6
ISTIC 101 0 44
JaLC 9 549 0
mEDRA 647 5 9
OPOCE 176 2 3
Public 367 6 25
Error 9,412 324 380
Total 1,474,230 28,799 36,669

enwiki stands for the English Wikipedia; jawiki stands for the Japanese Wikipedia; zhwiki stands for the Chinese Wikipedia. The authors summarize:

Table 2 shows the number of total DOI links for RAs. Most of DOI links in these Wikipedia are CrossRef DOIs. The second most-referenced DOI links in enwiki are mEDRA DOIs; those in jawiki are JaLC DOIs; those in zhwiki are ISTIC DOIs. Note that JaLC DOIs are not referenced in zhwiki, and ISTIC DOIs are not referenced in jawiki. In other words, the scholarly content in Japan tends to be referenced in jawiki, the content in China tends to be referenced in zhwiki.

From this table, we can compute the percent of all DOI links on wikipedia that are for Crossref registered DOIs:

  • English Wikipedia: 99.87944% = 1463052 / (1474230 - 9412)
  • Japanese Wikipedia: 97.98068% = 27900 / (28799 - 324)
  • Chinese Wikipedia: 99.76026% = 36202 / (36669 - 380)

Therefore, it's clear that for DOIs that are actually referenced, they overwhelming were registered with Crossref. There is a small language effect with links in the Japanese Wikipedia using other registrars than Crossref about 2% of the time.

@dhimmel
Copy link
Collaborator Author

dhimmel commented Jun 16, 2017

From the 2014 article CrossRef developments and initiatives:

Not all of these DOIs will have been issued through CrossRef and other agencies like DataCite, mEDRA, Movie Labs and CNKI can assign DOIs to content as well. Geoffrey Bilder’s recent blog-post ‘DOIs unambiguously and persistently identify published, trustworthy, citable online scholarly literature. Right?’ is an interesting expansion on this point and is useful to bear in mind that not all DOIs are CrossRef DOIs.

In @gbilder's referenced blog post, there's a nice visualization defining scope overlap between the RAs

The RAs in the top right corner are the relevant ones for scholarly content.

@dhimmel
Copy link
Collaborator Author

dhimmel commented Jun 16, 2017

From the FAQ page at doi.org on June 16, 2017:

4. How many DOI names are there, and who uses them?
Approximately 133 million DOIs have been assigned through a federation of Registration Agencies world-wide with an annual growth rate of 16%. See the factsheet DOI Key Facts and the DOI Handbook, 2 Numbering.

At this time, the Crossref API reports 89 million DOIs. Specifically https://api.crossref.org/works?rows=1 returns "total-results":89216081. Although these numbers are likely not from the exact same point in time, we can estimate that 66.9% = 89 / 133 of DOIs are registered by Crossref.

@dhimmel dhimmel closed this as completed Feb 6, 2018
@dwhly
Copy link

dwhly commented Aug 20, 2018

I found the above statistic handy.

I just polled the Crossref API again today, and got "total-results":99085806 DOIs. Approximately 13 months have transpired since 89 million was returned, so we can surmise that Crossref is issuing 12/13 = x/9869725 or approx 9.1M DOIs per year (over the last year-- a number which might be increasing?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants