-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MakeDataCount only retrieves 25 citations #6138
Comments
@qqmyers good catch. I'm super curious to see what the top 10 cited datasets are per installation of Dataverse. Simply getting the first 1000 sounds like a good start. When @mheppler and I talked about the UI with @kcondon I can tell you if there truly are 1000 citations, we're going to have to rework the popup, which doesn't scroll. I guess having that many citations to a dataset would be a good problem to have. 😄 |
Here's a screenshot of how the popup looks from #5253 (comment) (we may have adjusted the text a bit since then): @qqmyers something else I wanted to mention is that if you are taking any notes on setting up Make Data Count, you are very welcome to post them in this issue:
|
@qqmyers I've been meaning to tell you something that I mentioned to @djbrooke the other day. Back when we were working on Make Data Count support, I used this curl command to retrieve citations: curl 'https://api.datacite.org/events?doi=10.7910/dvn/hqzoob&source=crossref' Back then it found two citations to https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/HQZOOB which were: I know this because those papers are in https://github.com/IQSS/dataverse/blob/v4.16/src/test/java/edu/harvard/iq/dataverse/makedatacount/citations-for-doi-10.7910-DVN-HQZOOB.json Now when I run that curl command I get zero citations to that dataset (!). Do you think this is because of the bug you're reporting here? I'm concerned that Dataverse won't correctly retrieve all the citations. When I met with @erikbuunk and @TaniaSchlatter the other day about dataset citations across all Dataverse installations (see also #6139 (comment) ) I was going to show them the "live" curl output of the dataset above but because there were no citations anymore (!) I showed them that citations-for-doi-10.7910-DVN-HQZOOB.json. If you have any insight into this, please let me know! |
@pdurbin - Interesting! I don't think it's related.
seems to find one (goes to the test server). FWIW: I haven't looked for datasets where I know there is a paper citation - just the is-part-of relationships that QDR sends for dataset-file relations. When I run Removing the &source=crossref doesn't seem to change the result (&source-id=crossref as their docs suggest gives me an empty result). |
@qqmyers ok, I appreciate you digging on this because I'm working on all kinds of other stuff right now. 😄 |
The default response to https://api.datacite.org/events?doi=&source=crossref as called in MakeDataCountApi.updateCitationsForDataset only returns 25 responses because it paginates.
(Try curl https://api.datacite.org/events?doi=10.33564/FK2LJNVAG&source=crossref to see this.)
It looks like you can add &page[size]=1000 (1000 is the max) which may be sufficient for most datasets, but it looks like you have to handle pagination to get the max they'll send (10K entries). Info at https://support.datacite.org/docs/pagination.
The text was updated successfully, but these errors were encountered: