Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support alternate versions #61

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Support alternate versions #61

wants to merge 10 commits into from

Conversation

gaurav
Copy link
Collaborator

@gaurav gaurav commented Apr 6, 2020

This PR will be used to add support for alternate versions of PubMed articles (#61). For now, it just emits URLs that include the version number (e.g. https://www.ncbi.nlm.nih.gov/pubmed/31431825.2 to indicate version 2 of PMID 31431825).

@gaurav gaurav changed the base branch from master to fix-missing-date April 6, 2020 21:09
@gaurav gaurav marked this pull request as ready for review May 5, 2020 18:27
@gaurav gaurav changed the base branch from fix-missing-date to master May 5, 2020 18:27
@gaurav gaurav force-pushed the alternate-versions branch from 1936316 to 86e00a9 Compare May 5, 2020 18:50
@gaurav gaurav force-pushed the alternate-versions branch from f20ffb2 to 5db6c9f Compare May 19, 2020 15:44
@gaurav gaurav requested a review from balhoff May 19, 2020 19:09
Copy link
Collaborator

@balhoff balhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this make it hard for someone to come to the SPARQL endpoint with a PMID, e.g. PMID:31431825 and make a query? Do they need to know to check for a version?

@gaurav
Copy link
Collaborator Author

gaurav commented May 28, 2020

Will this make it hard for someone to come to the SPARQL endpoint with a PMID, e.g. PMID:31431825 and make a query? Do they need to know to check for a version?

Hmm, good point. Right now, I include the original PMID with the triple: PMID:31431825.2 dcterms:isVersionOf PMID:31431825. In SPARQL, therefore, you can find all versions of a particular PMID by querying for ?article dcterms:isVersionOf PMID:31431825. I think that's probably good enough? @cbizon Will this be a problem for Robokop?

The key problem here is that AFAICT there's no way to tell whether a particular article is the most recent version or not except by looking for the largest version ID, which we can't do while processing the input as a stream. I previously considered filtering out older versions of articles when generating triples, but I'm loath to lose any potential information that might be there. I think maybe including the version number in the identifiers will make it clear that downstream users need to take the different versions into account when working with this data. For instance, I'm thinking of filtering out previous versions from the final tab-delimited output rather than messing with the triples themselves. What do you think?

@gaurav gaurav force-pushed the alternate-versions branch from 5db6c9f to 7a7fdc0 Compare April 1, 2021 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants