-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NRES & bionetwork backfill #1301
Comments
Lucia replied specifying that we should use the tracker names. Here is a list of the tracker's atlases as of today:
A patch in the hca_bionetworks module needs to be made to add those names, before we proceed with updating projects with atlas names, bionetworks and bump project schema version and add data_use_restriction. |
summarised the number of projects to update here (spreadsheet, folder) If we want to update all data_use_restriction & bionetwork & atlas info: 431/477 projects |
We decided to update only the projects that needed bionetwork & atlas update, but add also data_use_restriction field for that as well. Out of the 143, 3 projects had no submission (probably wrangled by lattice). list of ingest updates
It might be good to notify Dave and indexing team, that for projects with more than one atlas from same bionetwork, we had to duplicate the bionetwork for modelling reasons. However, we wouldn't like to show twice the same bionetwork in the browser. Also, before we publish, we might need to ask execs to verify this publication since some bionetworks might not be ready to make (part of) their list public (see tracker confidentiality). |
I'm only exporting project from the Lung v2.0 list as we wait for confirmation from Ellen that we can publish this information exported projects
|
We exported the following uuids for R44: exported & in manifest for R44
There were 3 uuids that were eligible but were missed out from the process:
|
Update done in ingest for Project |
projects exported and filled import form |
Verified all bionetwork updates in separate tab in spreadsheet using azul api for uuid in uuids:
proj = requests.get("https://service.azul.data.humancellatlas.org/index/projects/"+uuid+"?catalog=dcp44").json()
tissue_atlas = ", ".join([t['atlas'] + " " + t['version'] for t in proj['projects'][0]['tissueAtlas']]) if proj['projects'][0]['tissueAtlas'] else ''
bionetworks = ", ".join(proj['projects'][0]['bionetworkName']) if proj['projects'][0]['bionetworkName'][0] else ''
data_use = proj['projects'][0]['dataUseRestriction']
print('\t'.join([uuid, tissue_atlas, bionetworks, data_use])) |
Need to re-export #1307 to include bionetwork & atlas info |
There are two bulk metadata updates on the project level, that we'd like to do.
Reasoning
After the introduction of managed access datasets in the portal, we would like to add the
data_use_restriction
field in the metadata of all open access projects i.e. all projects of the portal that this update was not done in the previous bulk update in Data Portal tracker - Data Repository tasks #1270. This would require bumping the project schema version to version 19.0.0 and add the field "data_use_restriction": "NRES" in the project metadata.Dave asked us to add the bionetwork information in the schema, since portal started showing the biological network on the front page by default. There are a couple of open questions here.
a. what is the true list for bionetworks? Is it tracker?
b. what is the true list for atlas names? In tracker some atlas names are initials (i.e.
MSK 1.0
, orORCF 1.0
). Do we want to add these names?c. Projects in portal with no bionetwork: would we like to show
None
instead ofunspecified
?Plan
Since both metadata exist in the project level, we would like to update using @idazucchi 's script which exports only project metadata (don't have to update the state to
graph valid
, just return toexported
). The steps would be:1,2,3 tasks can be done via the Task tracker spreadsheet
4 script is almost ready for previous bulk update in #1270 (see comments for script) a few modifications might be needed
5 if we provide uuids to script it runs quickly
6 we can also extract project title in order to populate the import form easily
Estimated time needed ~2 days
Risks
The text was updated successfully, but these errors were encountered: