Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSE174588 - PreneoplasticBreastNee10x #1340

Open
11 tasks
arschat opened this issue Dec 17, 2024 · 4 comments
Open
11 tasks

GSE174588 - PreneoplasticBreastNee10x #1340

arschat opened this issue Dec 17, 2024 · 4 comments
Labels
dataset All dataset tickets should have this label, only one ticket per dataset Release 46 DCP Data Release 46 @ 27/1

Comments

@arschat
Copy link
Collaborator

arschat commented Dec 17, 2024

Project short name:

PreneoplasticBreastNee10x

Primary Wrangler:

Arsenios

Secondary Wrangler:

Associated files

Published study links

Key Events

  • Convert published metadata to HCA spreadsheet
  • Manually curate dataset to meet HCA metadata standard
  • Collect any matrix and cell-type annotation files
  • Are the analysis files suitable for CellxGene? If something is missing get in touch with the authors to request it
  • Upload sheet to validate metadata
  • Transfer raw files to ingest to validate data files
  • Check linking using ingest graph validator
  • Ask the Secondary Wrangler for an end-to-end review of the project. Ask the Expertise Wrangler to review specific tabs if needed
  • Submit dataset to Production
  • Complete the Export SOP
  • Convert project data to SCEA format following the SCEA conversion SOP if appropriate
@arschat arschat added dataset All dataset tickets should have this label, only one ticket per dataset Release 46 DCP Data Release 46 @ 27/1 labels Dec 17, 2024
@arschat
Copy link
Collaborator Author

arschat commented Dec 17, 2024

Using ENA to download bam files and then convert to fastq with bamtofastq.

@arschat
Copy link
Collaborator Author

arschat commented Dec 19, 2024

hca-util upload area: daf551a8-0658-4282-ab95-4f160af483a6

@arschat
Copy link
Collaborator Author

arschat commented Dec 19, 2024

id_in_geo_matrices specimen_id
COH_Patients_CA075_BRCA BRCA_Pt4
COH_Patients_CA098_BRCA BRCA_Pt5
COH_Patients_CA1140_Ctrl NC_Pt4
COH_Patients_CA165_BRCA BRCA_Pt6
COH_Patients_CA275_Ctrl NC_Pt5
COH_Patients_CA278_Ctrl NC_Pt6
COH_Patients_CA287_Ctrl NC_Pt7
COH_Patients_CA320_BRCA BRCA_Pt7
COH_Patients_CA460_BRCA BRCA_Pt10
COH_Patients_CA639_Ctrl NC_Pt8
COH_Patients_CA719_Ctrl NC_Pt9
COH_Patients_CA818_BRCA BRCA_Pt9
COH_Patients_CA843_Ctrl NC_Pt10
COH_Patients_CB375_Ctrl NC_Pt11
COH_Patients_CB468_BRCA BRCA_Pt11
COH_Patients_CB_699_BRCA BRCA_Pt8
UCI_Patients._BRCA_Ind2_BRCA_Ind2_Epi BRCA_Pt1
UCI_Patients._BRCA_Ind2_BRCA_Ind2_Stroma BRCA_Pt1
UCI_Patients._BRCA_Ind3_BRCA_Ind3_Epi BRCA_Pt3
UCI_Patients._BRCA_Ind3_BRCA_Ind3_Stroma BRCA_Pt3
UCI_Patients._BRCA_Ind4_BRCA_Ind4_Epi BRCA_Pt2
UCI_Patients._BRCA_Ind4_BRCA_Ind4_Stroma BRCA_Pt2
UCI_Patients._Ctrl_Ind10_Ctrl_Ind10_Epi NC_Pt3
UCI_Patients._Ctrl_Ind10_Ctrl_Ind10_Stroma NC_Pt3
UCI_Patients._Ctrl_Ind1_Ctrl_Ind1_Epi NC_Pt1
UCI_Patients._Ctrl_Ind1_Ctrl_Ind1_Stroma NC_Pt1
UCI_Patients._Ctrl_Ind9_Ctrl_Ind9_Epi NC_Pt2
UCI_Patients._Ctrl_Ind9_Ctrl_Ind9_Stroma NC_Pt2

@arschat
Copy link
Collaborator Author

arschat commented Dec 20, 2024

errors in some extracted fastq files.

ERROR: Error in file /data/d43692c8-e57a-450a-b1c2-05193eec877f/ctrl1_QN_ind1_wd37026_ucsf_normal_basal_luminal_MissingLibrary_1_HY7MKBCXX_S1_L001_I1_001.fastq.gz: line 73: read length too small - 0

Removed empty reads but:

  • brca2_QN_ind3_m1161055a_uci_brca_basal_luminal_MissingLibrary_1_HHJ5KBBXX_S1_L001_I1_001.fastq.gz, brca2_QN_ind3_m1161055a_uci_brca_basal_luminal_MissingLibrary_1_HHJ5KBBXX_S1_L002_I1_001.fastq.gz and brca2_QN_ind3_m1161055a_uci_brca_basal_luminal_MissingLibrary_1_HHV2FBBXX_S1_L004_I1_001.fastq.gz had only empty reads, therefore, we removed it from the submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset All dataset tickets should have this label, only one ticket per dataset Release 46 DCP Data Release 46 @ 27/1
Projects
None yet
Development

No branches or pull requests

1 participant