-
Notifications
You must be signed in to change notification settings - Fork 121
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2105 from cBioPortal/target-rel
TARGET-GDC Initial Data Release
- Loading branch information
Showing
163 changed files
with
1,717 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
# Comments & assumptions made during curation | ||
|
||
## General | ||
|
||
- Study is updated once every 3 months with latest data from [ISB-CGC BigQuery tables](https://isb-cgc.appspot.com/bq_meta_search/) | ||
- The ISB-CGC tables allow easy access to data collected from multiple NCI-CRDC repositories including the GDC, PDC, and others. The TARGET data in this study comes from the GDC and is accessed through these tables. | ||
- Reference genome used: hg38 | ||
- TARGET started using the hg38 genome as of GDC release 32. For more information, refer to the [GDC release notes](https://docs.gdc.cancer.gov/Data/Release_Notes/Data_Release_Notes/#data-release-320). | ||
|
||
- Only tumor sample data is included (no normal samples) | ||
- [GDC project webpage](https://portal.gdc.cancer.gov/projects/TARGET-ALL-P3) | ||
|
||
## Clinical data | ||
|
||
- **Patient data:** Retrieved from `isb-cgc-bq.TARGET_versioned.clinical_gdc_r40`. ISB-CGC data was created in April 2024. | ||
- **Sample data:** Retrieved from `isb-cgc-bq.TARGET_versioned.per_sample_file_metadata_hg38_gdc_r40`. ISB-CGC data was created in April 2024. | ||
|
||
### Survival data | ||
|
||
Survival fields are calculated from the clinical data and added as new columns in the clinical file. | ||
|
||
- `OS_STATUS` is converted from `demo__vital_status` | ||
- `OS_MONTHS` is converted from `demo__days_to_death`, falls back to `diag__days_to_last_follow_up` | ||
|
||
|
||
|
||
|
||
### Timeline data | ||
|
||
- Timeline data is extracted from the clinical data and stored in separate data files. After extraction, the corresponding BigQuery fields are removed from the clinical file. For example, a timeline status of `DEATH` corresponds to the BigQuery field `demo__days_to_death`. | ||
|
||
#### Patient status data | ||
|
||
- For TARGET, the "time 0" anchor point is always the date of diagnosis. Not all patients have timeline data available, as indicated by a null `diag__days_to_diagnosis` (TCGA) or `index_date` (CPTAC, TARGET) field. | ||
|
||
- Birth timeline events are removed, as they (1) push other events to the far right of the graph and (2) can potentially be used to identify the patient. | ||
|
||
The following status values are supported in `data_timeline_status.txt`: | ||
|
||
- (time 0) → `Initial Diagnosis` | ||
- `demo__days_to_death` → `DECEASED` | ||
- `diag__days_to_last_follow_up` → `Last Follow Up` | ||
### Other transformations | ||
|
||
- `"not reported"` values are replaced with blanks. | ||
- If a clinical field is missing for the entire study, the column is removed from the data file. | ||
- `RACE`, `ETHNICITY`, and `SEX` are capitalized. | ||
- `AGE` is converted from days to years. | ||
|
||
## CNA data | ||
|
||
- Retrieved from `isb-cgc-bq.TARGET_versioned.copy_number_gene_level_hg38_gdc_r36`. ISB-CGC data was created in March 2023. | ||
- Transformations | ||
- Copy number values from the BigQuery tables are converted from [ASCAT](https://www.pnas.org/doi/10.1073/pnas.1009843107https://www.pnas.org/doi/10.1073/pnas.1009843107) to GISTIC 2.0 using the following thresholds: | ||
|
||
| ASCAT Value | GISTIC Value | Meaning | | ||
|---|---|---| | ||
| X = 0 | -2 | Deep loss | | ||
| X = 1 | -1 | Single-copy loss | | ||
| X = 2 | 0 | Diploid | | ||
| 2 < X < 7 | 1 | Low-level gain | | ||
| 7 ≤ X | 2 | Amplification | | ||
|
||
Only amplifications (GISTIC = 2) and deep deletions (GISTIC = -2) are shown on the cBioPortal website. As a result these conversion thresholds affect how many samples show up in the CNA chart, which can be inconsistent with legacy versions of this study. We chose ASCAT ≥ 7 as the amplification threshold because it resulted in the least deviation from our legacy studies. | ||
|
||
|
||
|
||
## mRNA Expression data | ||
|
||
- Retrieved from `isb-cgc-bq.TARGET_versioned.RNAseq_hg38_gdc_r35`. ISB-CGC data was created in December 2022. | ||
- The `unstranded`, `tpm_unstranded`, and `fpkm_uq_unstranded` columns are pulled and each mapped to their own data file. | ||
- The regular FPKM values are excluded because [FPKM-UQ provides a more stable metric](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/#upper-quartile-fpkm). | ||
- Transformations: see [Genomic data transformations](#genomic-data-transformations) | ||
|
||
|
||
|
||
## Mutation data | ||
|
||
- Retrieved from `isb-cgc-bq.TARGET_versioned.masked_somatic_mutation_hg38_gdc_r40`. ISB-CGC data was created in July 2024. | ||
- The MAF is annotated with Genome Nexus in order to avoid issues with the isoform mapping. Parameters used: | ||
- Endpoint: https://grch38.genomenexus.org/ | ||
- Isoform override: mskcc | ||
- Replace gene symbols and Entrez IDs | ||
- Post interval size: 500 | ||
- Mutation data may be missing for some samples-- this reflects a lack of data availability in ISB-CGC. | ||
|
||
## Expression data transformations (CNA / mRNA) | ||
|
||
- Ensembl gene IDs are mapped to Entrez IDs using the [Genome Nexus hg38 canonical transcript file](https://github.com/genome-nexus/genome-nexus-importer/blob/master/data/grch38_ensembl95/export/ensembl_biomart_canonical_transcripts_per_hgnc.txt). Any genes that cannot be converted using this file are dropped. | ||
- Prior to conversion, we also filter out a small number of duplicate Ensembl genes. These genes have copies containing data for both the X and the Y chromosomes. | ||
- If a sample has multiple aliquots, it has to be condensed to 1 before it can imported into cBioPortal. This is done by choosing the aliquot ID with the highest sort value (eg. highest plate number), following [the same policy](https://broadinstitute.atlassian.net/wiki/spaces/GDAC/pages/844334036/FAQ#FAQ-replicateFilteringQ%3AWhatdoyoudowhenmultiplealiquotbarcodesexistforagivensample%2Fportion%2Fanalytecombination%3F) used by GDAC used to condense aliquot data in their studies. | ||
|
||
## Post-processing steps | ||
|
||
- Samples that lack any genomic data are removed from the clinical sample file. | ||
- Metadata headers are added to the clinical patient and sample files using a curation-provided script. | ||
- TMB scores are calculated and added to the clinical sample file using a curation-provided script. | ||
- Case lists are generated under `case_lists/` using a curation-provided script. | ||
- The validator script is run and the HTML report is saved under `validation_reports/`. | ||
|
||
## List of remapped columns | ||
|
||
### Clinical patient | ||
|
||
| Original | cBioPortal | | ||
|---|---| | ||
| submitter_id | PATIENT_ID | | ||
| case_id | OTHER_PATIENT_ID | | ||
| demo__ethnicity | ETHNICITY | | ||
| demo__gender | SEX | | ||
| demo__race | RACE | | ||
| demo__vital_status | VITAL_STATUS | | ||
| diag__age_at_diagnosis | AGE | | ||
| diag__classification_of_tumor | TUMOR_CLASSIFICATION | | ||
| diag__cog_neuroblastoma_risk_group | COG_NEUROBLASTOMA_RISK_GROUP | | ||
| diag__icd_10_code | ICD_10 | | ||
| diag__inss_stage | INSS_STAGE | | ||
| diag__last_known_disease_status | DISEASE_STATUS | | ||
| diag__metastasis_at_diagnosis | METASTASIS_AT_DIAGNOSIS | | ||
| diag__morphology | MORPHOLOGY | | ||
| diag__path__necrosis_percent | PATHOLOGY_NECROSIS_PERCENT | | ||
| diag__primary_diagnosis | PRIMARY_DIAGNOSIS | | ||
| diag__site_of_resection_or_biopsy | BIOPSY_SITE | | ||
| diag__year_of_diagnosis | YEAR_OF_DIAGNOSIS | | ||
| disease_type | DISEASE_TYPE | | ||
| index_date | INDEX_DATE | | ||
| primary_site | PRIMARY_SITE_PATIENT | | ||
| proj__name | PROJECT_NAME | | ||
| proj__project_id | PROJECT_ID | | ||
|
||
|
||
### Clinical sample | ||
|
||
| Original | cBioPortal | | ||
|---|---| | ||
| case_barcode | PATIENT_ID | | ||
| sample_barcode | SAMPLE_ID | | ||
| sample_gdc_id | OTHER_SAMPLE_ID | | ||
| sample_type_name | SAMPLE_TYPE | | ||
| primary_site | PRIMARY_SITE | | ||
| days_to_collection | DAYS_TO_COLLECTION | | ||
| days_to_sample_procurement | DAYS_TO_SPECIMEN_COLLECTION | | ||
| is_ffpe | IS_FFPE | | ||
|
||
|
||
|
||
### Mutation | ||
|
||
| Original | cBioPortal | | ||
|---|---| | ||
| sample_barcode_tumor | Tumor_Sample_Barcode | | ||
| sample_barcode_normal | Matched_Norm_Sample_Barcode | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
cancer_study_identifier: alal_target_gdc | ||
stable_id: alal_target_gdc_all | ||
case_list_name: All samples | ||
case_list_description: All samples (233 samples) | ||
case_list_category: all_cases_in_study | ||
case_list_ids: TARGET-15-SJMPAL042796-09A TARGET-15-SJMPAL016342-09B TARGET-15-SJMPAL011914-09B TARGET-15-SJMPAL042943-09A TARGET-15-SJMPAL044948-09A TARGET-15-SJMPAL046471-09A.1 TARGET-15-PASZVW-09B TARGET-15-SJMPAL012419-09A TARGET-15-SJMPAL046471-09A TARGET-15-SJMPAL012426-09A.3 TARGET-15-SJMPAL041120-09A TARGET-15-SJMPAL043770-09B TARGET-15-SJMPAL040030-09A.1 TARGET-15-SJMPAL042789-09A.3 TARGET-15-SJMPAL043769-03B TARGET-20-SJAML040268-09A TARGET-20-SJAML045742-09A TARGET-15-SJMPAL040030-09A.3 TARGET-15-SJMPAL012418-09A.2 TARGET-15-SJMPAL042795-09A.2 TARGET-15-SJMPAL044950-09A TARGET-20-SJAML003320-09A TARGET-15-SJMPAL041122-09A.1 TARGET-15-SJMPAL016343-09A.1 TARGET-15-SJMPAL040030-09A.4 TARGET-15-SJMPAL041119-09A.1 TARGET-20-SJAML016570-09A TARGET-15-SJMPAL041119-09B TARGET-15-SJMPAL012419-09B TARGET-20-SJAML007113-09A TARGET-15-SJMPAL044953-09A TARGET-15-SJMPAL016343-09A TARGET-15-SJMPAL012422-09A TARGET-20-SJAML003322-09A TARGET-15-SJMPAL016340-09A TARGET-15-SJMPAL017981-03A TARGET-15-SJMPAL011912-09A.1 TARGET-15-SJMPAL040036-03A.2 TARGET-15-SJMPAL017978-09A TARGET-15-PARWPU-09A.1 TARGET-15-SJMPAL042799-09A TARGET-15-SJMPAL040031-09A.2 TARGET-20-SJAML045741-09A TARGET-15-SJMPAL046470-09A.1 TARGET-15-SJMPAL046467-09A TARGET-15-SJMPAL040030-09A.2 TARGET-15-SJMPAL040037-09A.2 TARGET-15-SJMPAL043512-09B TARGET-15-SJMPAL016342-09A.3 TARGET-15-SJMPAL012421-09B TARGET-20-SJAML007112-09A TARGET-15-SJMPAL040033-09A TARGET-15-SJMPAL046468-09A TARGET-15-SJMPAL040039-09A TARGET-15-SJMPAL012422-09A.1 TARGET-15-SJMPAL040038-09A TARGET-15-SJMPAL041117-09A TARGET-15-SJMPAL046466-09A TARGET-15-SJMPAL016342-09A.2 TARGET-15-SJMPAL041119-09A TARGET-15-SJMPAL011912-09A TARGET-15-SJMPAL040037-09A.1 TARGET-15-SJMPAL041122-09A.2 TARGET-15-SJMPAL046470-09A.2 TARGET-15-SJMPAL016344-09A TARGET-15-SJMPAL042801-09A TARGET-15-SJMPAL016447-09A.1 TARGET-15-SJMPAL041122-09A TARGET-15-SJMPAL043505-09B TARGET-15-SJMPAL042795-09A.3 TARGET-15-SJMPAL016343-09A.3 TARGET-20-SJAML045740-09A TARGET-15-PAVFTF-09B TARGET-15-SJMPAL041117-09A.1 TARGET-15-SJMPAL040459-09B TARGET-15-SJMPAL043508-09B TARGET-15-SJMPAL041120-09B TARGET-15-SJMPAL041118-09A.2 TARGET-15-SJMPAL042793-09A TARGET-15-SJMPAL042792-09A.2 TARGET-15-SJMPAL043775-09B TARGET-15-SJMPAL016447-09A TARGET-15-SJMPAL017973-09A.2 TARGET-15-SJMPAL043514-09A TARGET-15-SJMPAL044956-09A TARGET-15-SJMPAL040027-09A.1 TARGET-15-SJMPAL046469-09A TARGET-15-SJMPAL042789-09A.1 TARGET-15-SJMPAL041117-09B TARGET-15-SJMPAL011913-09A TARGET-20-SJAML045737-09A TARGET-15-SJMPAL043511-09B TARGET-15-SJMPAL017981-03A.2 TARGET-15-SJMPAL040034-09A TARGET-15-SJMPAL011911-03A.1 TARGET-15-SJMPAL041117-09A.2 TARGET-15-SJMPAL040036-03A TARGET-15-SJMPAL040031-09A.1 TARGET-15-SJMPAL043773-03B TARGET-15-SJMPAL012425-09A.2 TARGET-15-SJMPAL017978-09A.2 TARGET-15-SJMPAL012420-09A TARGET-15-SJMPAL040031-09A.4 TARGET-20-SJAML045736-09A TARGET-15-SJMPAL042794-09A.3 TARGET-15-SJMLL003311-09A.2 TARGET-15-SJMPAL046466-09A.1 TARGET-15-SJMPAL042946-09A TARGET-15-SJMPAL042794-09A TARGET-15-SJMPAL043512-09A TARGET-15-SJMPAL040027-09A.3 TARGET-15-SJMPAL040033-09A.2 TARGET-15-SJMPAL040036-03B TARGET-15-SJMPAL040038-09B TARGET-15-SJMPAL043771-09B TARGET-15-SJMPAL041120-09A.1 TARGET-15-SJMPAL017973-09A.1 TARGET-15-SJMPAL042791-09B TARGET-15-SJMPAL017976-09A.2 TARGET-15-SJMPAL040037-09A.4 TARGET-15-SJMPAL040033-09A.3 TARGET-15-PARWPU-09B TARGET-15-SJMPAL044949-09A TARGET-15-SJMPAL012422-09A.2 TARGET-15-SJMPAL012422-09A.3 TARGET-15-SJMPAL040025-09B TARGET-15-SJMPAL041122-09A.3 TARGET-15-SJMLL003311-09A.1 TARGET-15-SJMPAL042789-09A.2 TARGET-15-SJMPAL041118-09A.1 TARGET-20-SJAML045735-09A TARGET-15-SJMPAL042798-09A TARGET-15-SJMPAL016447-09A.3 TARGET-15-SJMPAL017976-09A.3 TARGET-15-SJMPAL012418-09A.3 TARGET-15-SJMPAL012417-09A TARGET-15-SJMPAL042787-09A.2 TARGET-15-SJMPAL040036-03A.3 TARGET-15-SJMPAL046471-09A.2 TARGET-15-SJMPAL042794-09A.2 TARGET-15-SJMPAL040037-09B TARGET-15-SJMPAL043774-09B TARGET-15-SJMPAL040025-09A.1 TARGET-15-SJMPAL017981-03A.3 TARGET-20-SJAML045734-09A TARGET-15-SJMPAL042794-09B TARGET-15-SJMPAL042941-09A.2 TARGET-15-SJMPAL040031-09A TARGET-15-SJMPAL012426-09A.2 TARGET-15-SJMPAL042793-09B TARGET-15-SJMPAL040031-09A.3 TARGET-15-SJMPAL043510-09A TARGET-15-SJMPAL017981-03A.1 TARGET-15-PAREAT-09B TARGET-15-SJMPAL016342-09A.1 TARGET-15-SJMPAL040025-09A.2 TARGET-15-SJMPAL042792-09A TARGET-15-SJMPAL040039-09B TARGET-15-SJMPAL040024-09A TARGET-15-SJMPAL012426-09A.1 TARGET-15-SJMPAL040024-09A.1 TARGET-15-SJMPAL042792-09A.1 TARGET-15-PARWPU-09A TARGET-15-SJMPAL042797-09A TARGET-15-SJMPAL012424-03A.2 TARGET-15-SJMPAL017975-03A.1 TARGET-15-SJMPAL040035-03A TARGET-15-SJMPAL011914-09A TARGET-15-SJMPAL040459-09A TARGET-15-SJMPAL017977-09A TARGET-15-SJMPAL017978-09A.1 TARGET-15-SJMPAL044957-09A TARGET-15-SJMPAL017975-03B TARGET-15-SJMPAL040026-09A TARGET-15-SJMPAL046470-09A TARGET-15-SJMPAL042801-09B TARGET-15-SJMPAL042792-09A.3 TARGET-15-SJMPAL042794-09A.1 TARGET-15-SJMPAL011915-09A TARGET-15-SJMPAL040038-09A.1 TARGET-15-SJMPAL040032-09A.1 TARGET-15-SJMPAL017975-03A.2 TARGET-20-SJAML045738-09A TARGET-15-SJMPAL017973-09A TARGET-15-SJMPAL022667-09A TARGET-15-SJMPAL042791-09A TARGET-15-SJMPAL042787-09B TARGET-15-SJMPAL040033-09A.1 TARGET-15-SJMPAL041118-09A.3 TARGET-15-SJMPAL046466-09A.2 TARGET-15-SJMPAL042787-09A.1 TARGET-15-SJMPAL012425-09A.1 TARGET-15-SJMPAL012425-09A TARGET-15-SJMPAL012420-09A.1 TARGET-15-SJMPAL016342-09A TARGET-15-SJMPAL016343-09A.2 TARGET-15-SJMPAL040027-09A TARGET-15-SJMPAL040036-03A.1 TARGET-15-SJMPAL042798-09B TARGET-15-SJMPAL016448-09A TARGET-15-SJMPAL043768-09B TARGET-15-SJMPAL011911-03A TARGET-15-SJMPAL017976-09A.1 TARGET-15-SJMPAL040039-09A.2 TARGET-15-SJMPAL040039-09A.1 TARGET-15-SJMPAL042795-09A.1 TARGET-15-PAUFIB-09B TARGET-15-SJMPAL012424-03A.1 TARGET-15-SJMPAL041118-09A TARGET-15-SJMPAL042799-09B TARGET-15-SJMPAL041119-09A.2 TARGET-15-SJMPAL040025-09A TARGET-15-PARWPU-09A.2 TARGET-15-SJMPAL040030-09A TARGET-15-SJMPAL012418-09A TARGET-15-SJMPAL043773-40A TARGET-15-SJMPAL012419-09A.1 TARGET-15-SJMPAL040024-09A.2 TARGET-15-SJMPAL040036-03A.4 TARGET-15-SJMPAL012420-09A.2 TARGET-15-SJMPAL043767-09B TARGET-15-SJMPAL042787-09A TARGET-15-PARWPU-09A.3 TARGET-15-SJMPAL012418-09A.1 TARGET-15-SJMPAL040037-09A TARGET-15-SJMPAL042946-09B TARGET-15-SJMPAL017978-09A.3 TARGET-15-SJMPAL017975-03A TARGET-15-SJMPAL046468-09A.1 TARGET-15-SJMPAL016447-09A.2 TARGET-15-SJMPAL046472-09A TARGET-15-SJMPAL040027-09A.2 TARGET-20-SJAML045739-09A |
Oops, something went wrong.