-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write script to download and parse datasets #5
Comments
I implemented a master script and accompanying config file to download and parse network files (See commit 70284f4). Currently only tested using the TissueNet v2 collections. Planning to add gene sets soon. |
Excellent work! Noticed that the config file has your directory name. We need to make it generic somehow. |
Oh my bad, that was just for testing. It will be the generic |
|
In response to your questions:
|
Added the ability to download and parse gene sets in this commit: dc0b518 |
Currently they are treated as gene sets. Adds to issue #5
Can we close this issue now? For every new dataset we want to add, we can open another issue. |
Sounds good. |
Currently they are treated as gene sets. Adds to issue #5
Instead of a manual, piecemeal approach to downloading datasets, implement a script that will download every dataset, and parse it into a format that the rest of the code can use. Some of the files we download may not change over the course of a few months (e.g., tissue-specific networks) while others may change rapidly (e.g., COVID-19-specific gene sets).
Initially, we have to regularly update the script every time we see a new source of interesting and useful information. In the longer run, the sources will stabilise and the script will mature.
A config (YAML) file can record and document the different sources, URL, data types, and directories where we will store the files.
The text was updated successfully, but these errors were encountered: