You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to ingest Tier2 metadata in our ingest platform, we need to convert the existing Tier1 metadata to DCP schema, and then if there is previously wrangled submission, compare spreadsheets and show differences between the two.
Then we can decide which fields should be updated before appending Tier2 fields in the submission based on donor_id or sample_id.
flowchart TD
A[CxG <a href='https://cellxgene.cziscience.com/collections'> public database</a>]
B[Tier 1 - csv]
C[Tier 1 - dcp spreadsheet]
D[Tier 1 - dcp validated]
E[previously wrangled - spreadsheet]
F[Tier 2 - xlsx/ csv]
G[Full metadata - dcp spreadsheet]
A --> |<a href='https://github.com/ebi-ait/hca-tier1-to-dcp/blob/main/cellxgene_metadata_collection.py'>collection</a>| B
B --> |<a href='https://github.com/ebi-ait/hca-tier1-to-dcp/blob/main/convert_to_dcp.py'>DCP mapping</a>| C
C --> |If not previously wrangled| D
C --> |If previously wrangled| E
E --> |<a href='https://github.com/ebi-ait/hca-tier1-to-dcp/blob/main/compare_with_dcp.py'>Compare</a>| D
D --> G
F --> D
subgraph collection
A
end
subgraph conversion
B
C
end
subgraph comparison
E
end
subgraph addition
F
G
end
Loading
We can split this process in the following steps.
create dcp ingestible spreadsheet
a. pull Tier1 metadata from CxG api in csv
b. convert Tier1 csv to DCP spreadsheet
compare two spreadsheets
a. compare per number of entity
b. compare per biomaterial_id
c. compare all other field values
d. if there are discrepancies, manual curation to figure out how to update submission
append Tier2 into "Tier1"
a. based on Tier2 to DCP mapping, using donor_id & sample_id join all Tier2 metadata
The text was updated successfully, but these errors were encountered:
arschat
changed the title
tier1 to dcp validator
tier1 to dcp journey
Nov 12, 2024
In order to ingest Tier2 metadata in our ingest platform, we need to convert the existing Tier1 metadata to DCP schema, and then if there is previously wrangled submission, compare spreadsheets and show differences between the two.
Then we can decide which fields should be updated before appending Tier2 fields in the submission based on
donor_id
orsample_id
.We can split this process in the following steps.
a. pull Tier1 metadata from CxG api in csv
b. convert Tier1 csv to DCP spreadsheet
a. compare per number of entity
b. compare per biomaterial_id
c. compare all other field values
d. if there are discrepancies, manual curation to figure out how to update submission
a. based on Tier2 to DCP mapping, using
donor_id
&sample_id
join all Tier2 metadataThe text was updated successfully, but these errors were encountered: