Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for extracting the full sample/portion/analyte hierarchy from GDC #6

Open
gaurav opened this issue May 31, 2021 · 0 comments
Labels
good first issue Good for newcomers use case A use case that needs to be added to the Example Data Workflow

Comments

@gaurav
Copy link
Collaborator

gaurav commented May 31, 2021

We currently use a single GDC query that only provides information on the case, along with lists of identifiers for the samples, portions, analytes, aliquots and slides. However, we need to make additional queries to actually retrieve the data associated with these. We should include this information in the downloaded data and demonstrate how to consider (or, more usefully, just build a transformation library that can retrieve all of this GDC data and export it as CRDC-H instance data).

As an example, GDC case TCGA-HNSC / TCGA-CV-7261 reports the following items in the data we have obtained from their service:

  • 'aliquot_ids': ['8f695cd3-01dd-4601-8b17-37cf40514422', 'f0e325f8-297c-41e3-913d-a70e35ab5096', 'b6063ecd-bd1b-4cff-bf17-33357c17573b', 'e7d6f3dd-9de4-47e8-86f8-6f2a1b8b716e','10c3c9e6-3bb2-4082-913a-57e80018cb45', '9fa7bc79-d05b-41da-8bcc-8d5ad4451b0c', '1f730300-dafe-4b06-9da3-9b9d2855cbac', '81fa5865-d03a-401c-a9b7-19a38a36ec33', 'c9731404-75c5-4b04-82b3-aec9cbd0290e', 'af270fca-56e4-493a-9735-e51964cac713', 'b34ad9ae-e439-49c1-9512-daedfc15ed13', '9760e44d-1227-40b0-8469-dea55bd02b5d', '70d41532-dd46-4e3f-9417-8a7306ef4117', '2376520b-1f75-4e34-b3fb-92fa131d938a','01f28aef-6802-4d54-a644-448887298280','78ca1f34-d401-4056-a821-ce8cf947c669', '43406d9f-8734-4b0b-8b2d-6b617575607a', 'cc61d260-3898-4328-984a-7ddee700d6a8']
  • 'analyte_ids': ['a72f2de7-eb40-4818-a104-edb508d5517b', 'e8120e5b-79a0-46ca-b603-88e2d6745657', 'cc4e73d3-e4f0-42a0-97cb-ef99336bdad8', '66dc8914-c32e-46d6-9769-278e90dcc062', '8c80c204-c894-4939-af90-07988a86bd02', 'd10874e0-fae9-43a6-8b47-2366cd929960', '147eac8b-ec1f-4dd3-a760-22e02c4b7098', '34f70218-2f5d-49e6-b73b-043e465a4c6b']
  • 'portion_ids': ['177fa10b-0135-468d-b5a3-6f30cc3cd390', 'f51d76a7-77af-4513-b7eb-6fbd05aeeff9', '1a6628a1-09e5-4fca-9917-f577d7ca08fe', '806efd93-d80d-4a4b-83f0-ee6362022052']
  • ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers use case A use case that needs to be added to the Example Data Workflow
Projects
None yet
Development

No branches or pull requests

1 participant