Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CxG-Tier 1 to DCP notebook #1252

Closed
arschat opened this issue Mar 22, 2024 · 7 comments
Closed

CxG-Tier 1 to DCP notebook #1252

arschat opened this issue Mar 22, 2024 · 7 comments
Assignees
Labels
HCA operations This issue is an operational task

Comments

@arschat
Copy link
Collaborator

arschat commented Mar 22, 2024

When Bionetworks provide Tier 1 metadata for CxG we should be able to create DCP spreadsheets that have their Tier 1 fields populated. We will need that for:

  • populating projects that are not currently on ingest
  • compare the contributor Tier 1 metadata with metadata wrangled on ingest to decide if an update is needed

As action items, these are the following steps:

  1. create jupyter notebook(s) that pull data from CxG and
  2. exports DCP spreadsheet
  3. automate conditional field conversions

For future tasks we might want add functionalities like:

  • automate this into a script
  • compare contributor Tier_1 spreadsheet with DCP spreadsheet and produce a "diff-like" output (maybe using na-ka-na/ExcelCompare)
@arschat arschat added the operations This issue is an operational task label Mar 22, 2024
@arschat arschat self-assigned this Mar 22, 2024
@arschat arschat changed the title CxG-Tier 1 to DCP script CxG-Tier 1 to DCP notebook Mar 25, 2024
@arschat
Copy link
Collaborator Author

arschat commented Mar 26, 2024

Notebooks here. Will create a git repo to showcase changes.

@arschat
Copy link
Collaborator Author

arschat commented Apr 3, 2024

Repo created at arschat/tier1_to_dcp

@arschat
Copy link
Collaborator Author

arschat commented Apr 23, 2024

Exact map fields

doi: project.publications[0].doi
title: project.project_core.project_title,
study_pi: project.contributors.name,
sample_id: specimen_from_organism.biomaterial_core.biomaterial_id,
donor_id: donor_organism.biomaterial_core.biomaterial_id,
protocol_url: library_preparation_protocol.protocol_core.protocols_io_doi,
library_id: cell_suspension.biomaterial_core.biomaterial_id,
library_id_repository: cell_suspension.biomaterial_core.biomaterial_name,
sample_collection_method: collection_protocol.method.text,
tissue_ontology_term_id: specimen_from_organism.organ_parts.ontology,
tissue_free_text: specimen_from_organism.organ_parts.text,
sample_preservation_method: specimen_from_organism.preservation_storage.storage_method,
suspension_type: library_preparation_protocol.nucleic_acid_source,
cell_viability_percentage: cell_suspension.cell_morphology.percent_cell_viability,
cell_number_loaded: cell_suspension.estimated_cell_count,
sample_collection_year: specimen_from_organism.collection_time,
assay_ontology_term_id: library_preparation_protocol.library_construction_method.ontology,
library_preparation_batch: sequence_file.library_prep_id,
sequenced_fragment: library_preparation_protocol.end_bias,
sequencing_platform: sequencing_protocol.instrument_manufacturer_model.ontology,
reference_genome: analysis_file.genome_assembly_version,
gene_annotation_version: analysis_protocol.gene_annotation_version,
intron_inclusion: analysis_protocol.intron_inclusion,
disease_ontology_term_id: donor_organism.diseases.ontology,
self_reported_ethnicity_ontology_term_id: donor_organism.human_specific.ethnicity.ontology,

Not implemented yet

institute @ sample level
sample_collection_site

Conversion implemented

sample_collection_relative_time_point: specimen_from_organism.biomaterial_core.timecourse.value,
organism_ontology_term_id: donor_organism.biomaterial_core.ncbi_taxon_id,
sex_ontology_term_id: donor_organism.sex,
manner_of_death: donor_organism.death.hardy_scale & donor_organism.is_living,
sample_source: donor_organism.is_living & specimen_of_organism.transplant_organ,
sampled_site_condition: specimen_from_organism.diseases.text, # if is healthy PATO, if adjacent PATO & adjacent disease_ontology_term_id, else disease_ontology_term_id
alignment_software: analysis_protocol.alignment_software & analysis_protocol.alignment_software_version,
library_sequencing_run: library_sequencing_run, # if library_sequencing_run is an INSDC accession
cell_enrichment: enrichment_protocol.markers, # if CL ontology add CL label
development_stage_ontology_term_id: donor_organism.organism_age
Automatic assignment of protocol_ids to biomaterials & files

Not implemented, with no planned implementation for now

tissue_type

batch_condition
default_embedding
comments
author_batch_notes
is_primary_data
author_cell_type
cell_type_ontology_term_id

@arschat
Copy link
Collaborator Author

arschat commented Jun 18, 2024

Stalled until sequencing_run_id is pushed

@idazucchi idazucchi added the HCA label Sep 20, 2024
@arschat
Copy link
Collaborator Author

arschat commented Oct 28, 2024

We received 3 CxG datasets that have Tier 1 metadata.

https://cellxgene.cziscience.com/collections/c353707f-09a4-4f12-92a0-cb741e57e5f0
https://cellxgene.cziscience.com/collections/dc3a5256-5c39-4a21-ac0c-4ede3e7b2323
https://cellxgene.cziscience.com/collections/20eea6c8-9d64-42c9-9b6f-c11b5249e0e9

Notes:

  • in ingest we might have multiple biomaterials with the same ID however, if we use identical IDs for the generation, ingest won't distinguish between them and will create only one of the two entities. Therefore, for identical sample_ids and library_ids we need to first add a surfix _cs or _spec in this notebook, and later remove that.
  • we need to pull ontology labels from OLS4
  • sequencing_platform is not mandatory in Tier 1 but we require it for sequencing protocol. if it's not in T1 we might need to add a note wrangle in the script

@arschat
Copy link
Collaborator Author

arschat commented Nov 14, 2024

New repo in ebi-ait organisation created:
https://github.com/ebi-ait/hca-tier1-to-dcp

@arschat arschat closed this as completed Nov 14, 2024
@arschat
Copy link
Collaborator Author

arschat commented Nov 20, 2024

Full conversion instructions is now available in ebi-ait/hca-tier1-to-dcp#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HCA operations This issue is an operational task
Projects
None yet
Development

No branches or pull requests

2 participants