tier1 to dcp journey #1324

arschat · 2024-11-12T15:49:51Z

In order to ingest Tier2 metadata in our ingest platform, we need to convert the existing Tier1 metadata to DCP schema, and then if there is previously wrangled submission, compare spreadsheets and show differences between the two.
Then we can decide which fields should be updated before appending Tier2 fields in the submission based on donor_id or sample_id.

flowchart TD
    A[CxG <a href='https://cellxgene.cziscience.com/collections'> public database</a>]
    B[Tier 1 - csv]
    C[Tier 1 - dcp spreadsheet]
    D[Tier 1 - dcp validated]
    E[previously wrangled - spreadsheet] 
    F[Tier 2 - xlsx/ csv]
    G[Full metadata - dcp spreadsheet]
    
    A --> |<a href='https://github.com/ebi-ait/hca-tier1-to-dcp/blob/main/cellxgene_metadata_collection.py'>collection</a>| B
    B --> |<a href='https://github.com/ebi-ait/hca-tier1-to-dcp/blob/main/convert_to_dcp.py'>DCP mapping</a>| C
    C --> |If not previously wrangled| D
    C --> |If previously wrangled| E
    E --> |<a href='https://github.com/ebi-ait/hca-tier1-to-dcp/blob/main/compare_with_dcp.py'>Compare</a>| D
    D --> G
    F --> D
    
    subgraph collection
    A
    end
    subgraph conversion
    B
    C
    end
    subgraph comparison
    E
    end
    subgraph addition
    F
    G
    end

We can split this process in the following steps.

create dcp ingestible spreadsheet
a. pull Tier1 metadata from CxG api in csv
b. convert Tier1 csv to DCP spreadsheet
compare two spreadsheets
a. compare per number of entity
b. compare per biomaterial_id
c. compare all other field values
d. if there are discrepancies, manual curation to figure out how to update submission
append Tier2 into "Tier1"
a. based on Tier2 to DCP mapping, using donor_id & sample_id join all Tier2 metadata

The text was updated successfully, but these errors were encountered:

arschat · 2024-11-13T14:31:29Z

Repo for step 1 created:

ebi-ait/hca-tier1-to-dcp

Previous experiment here.

arschat changed the title ~~tier1 to dcp validator~~ tier1 to dcp journey Nov 12, 2024

arschat added the HCA label Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tier1 to dcp journey #1324

tier1 to dcp journey #1324

arschat commented Nov 12, 2024 •

edited

Loading

arschat commented Nov 13, 2024

tier1 to dcp journey #1324

tier1 to dcp journey #1324

Comments

arschat commented Nov 12, 2024 • edited Loading

arschat commented Nov 13, 2024

arschat commented Nov 12, 2024 •

edited

Loading