Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata.add_missing_references_to_paper_collection() fails on many model papers #185

Open
ramcdougal opened this issue Jun 22, 2023 · 4 comments
Assignees

Comments

@ramcdougal
Copy link
Contributor

e.g. https://modeldb.science/citations/267746

>>> metadata.add_references_to_existing_paper(267746)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bitnami/modeldb/extract_data/metadata.py", line 684, in add_references_to_existing_paper
    reference_metadata = get_reference_metadata(pmid)
  File "/home/bitnami/modeldb/extract_data/metadata.py", line 623, in get_reference_metadata
    reference_pmids, reference_dois = get_reference_pmids(pmid)
  File "/home/bitnami/modeldb/extract_data/metadata.py", line 570, in get_reference_pmids
    reference_list = doc.getElementsByTagName("ReferenceList")[0]
IndexError: list index out of range
>>> 

It claims it succeeds on 267749, but there are no references added.

(It truly succeeded on others, like 267707)

@ramcdougal
Copy link
Contributor Author

I think this is an inherent limitation, as https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=34807729 does not, in fact, return any references.

Is it possible to pull references from crossref?

@ramcdougal
Copy link
Contributor Author

You can do: https://api.crossref.org/works/10.1037/0003-066X.59.1.29 which returns a version of the information for that paper (look for all the article-title values).... but it doesn't solve our specific problem of https://api.crossref.org/works/10.1089/omi.2021.0155

@goldiezhu
Copy link
Collaborator

For 267746, there are no references added in pubmed; they were in an attached docx in supplementary material. For 267749, there were 30+ references but no pubmed IDs were included, which is why they weren't pulled

@goldiezhu
Copy link
Collaborator

For pulling from crossref, I don't know if we discarded the idea because we wanted to first finish accurate pubmed pulls or because we ran into issues. It seems potentially possible if you loop over the 'key's under 'reference' and pull the DOIs for each one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants