Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NCI Thesaurus #10

Merged
merged 8 commits into from
Sep 22, 2021
Merged

Add NCI Thesaurus #10

merged 8 commits into from
Sep 22, 2021

Conversation

jsstevenson
Copy link
Contributor

Howdy! We're interested in potentially making use of this in a few of our projects. If you're receptive to PRs, I have a small handful of other sources that we draw from, in addition to NCIt (and let me know if I'm missing anything here).

@cthoyt
Copy link
Member

cthoyt commented Sep 21, 2021

@jsstevenson absolutely, I would love to accept external contributions! I would also like to write a manuscript about the current state of versioning in biomedical database and ontology world, and how bioversions could be useful for the community, so if you're thinking about this stuff too I'd be keen to learn more and see if you'd want to help write that paper

Copy link
Member

@cthoyt cthoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good! I suppose this was a quite simple one :) I added a request to extract the date of the current release as well.

Last thing - do you know if there is a specific page corresponding to each version? For example in BioGRID, there's a way to construct a URL for a given version, which is really nice. If that's possible it would be great, but not required because we all know NCIt is very difficult to figure out.

@jsstevenson
Copy link
Contributor Author

Unfortunately the NCIt FTP archives follow a folder structure that is a little hard to capture in a single f-string -- they place the current year's releases one level up from prior years (which are all housed in subdirectories for each year), eg

2020/20.11e Release/
2020/20.12d Release/
21.08e Release/

I'd definitely be interested in getting in touch -- one of our group's broader projects focuses on knowledgebase integration in the cancer variant interpretation space (https://cancervariants.org/projects/integration/), so we have a vested interest in things like data provenance and reproducibility.

@cthoyt
Copy link
Member

cthoyt commented Sep 22, 2021

I'm going to merge now but if you could send a link to that FTP address I would appreciate it

@cthoyt cthoyt merged commit 40357b3 into biopragmatics:main Sep 22, 2021
@jsstevenson
Copy link
Contributor Author

I'm going to merge now but if you could send a link to that FTP address I would appreciate it

👍

https://evs.nci.nih.gov/ftp1/NCI_Thesaurus/archive/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants