Skip to content

Version 1.0

Compare
Choose a tag to compare
@quirinmanz quirinmanz released this 18 Oct 14:24
· 27 commits to main since this release
99de552

Version 1.0

At this time, this repository is only for sample metadata, not experiment metadata.
The CSV for the sample metadata can be found at openrefine/v1.0/IHEC_metadata_harmonization.v1.0.csv

News

  • The prefix harm has been renamed to harmonized for all columns where at least one cell was changed compared to the original data from EpiRR.
  • The prefix automated was added afterward for all columns that are generated completely automatically and lack manual curation. They are available in the extended version only.
  • The column originally called line has been renamed to cell_line, i.e., now harmonized_cell_line.
  • The column originally called markers has been renamed to cell_markers, i.e., now harmonized_cell_markers.
  • In all columns originally containing disease it has been renamed to sample_disease, to emphasize that this attribute reflects the disease for this particular sample, not the donor health condition.

Diff

The overall diff between v0.11 and v1.0 can be found at diff_v0.11_v1.0.json

Extended Version:

For more information on the columns from the extended version at IHEC_metadata_harmonization.v1.0.extended.csv, please also see version 0.9.

Metadata Standard

Please keep in mind that we try to stay as close to
the IHEC Metadata Standard
as possible.

Column descriptions:

The table below describes the columns included in the metadata table
at IHEC_metadata_harmonization.v1.0.csv and the extended metadata table
at IHEC_metadata_harmonization.v1.0.extended.csv.

Column Examples Explanation
EpiRR IHECRE00000001.4 EpiRR identifier. The number behind the dot (.) is the version.
project CEEHRC BLUEPRINT The project from which the epigenome originated.
harmonized_biomaterial_type cell line primary cell primary cell culture primary tissue One of primary cell,primary cell culture, cell line, primary tissue.
harmonized_sample_ontology_intermediate T cell epithelial cell derived cell line A manually refined higher level annotation describing the samples using ancestors in the ontology.
harmonized_sample_disease_high Healthy/None Cancer Disease A manually refined higher level annotation describing the disease using only three categories: Healthy/None, Cancer, Disease.
harmonized_sample_disease_intermediate Carcinoma Leukemia A manually refined higher level annotation describing the disease for this particular sample using ancestors in the NCIT ontology.
NCIM CURIEs were mapped to NCIT CURIES, see version 0.9 for explanation.
harmonized_EpiRR_status Complete Partial Whether this epigenome is Complete or Partial.
harmonized_cell_type myeloid cell effector memory CD8-positive, alpha-beta T cell The cell type and main sample ontology classification for entries where biomaterial_type is primary cell or primary cell culture.
harmonized_cell_line MCF 10A The cell line and main sample ontology classification for entries where biomaterial_type is cell line.
harmonized_tissue_type skeletal muscle tissue amygdala The cell line and main sample ontology classification for entries where biomaterial_type is primary tissue.
harmonized_sample_ontology_curie CL:0000990 UBERON:0001876 EFO:0001200 The CURIE identifying the sample ontology term.
Different ontologies are used, depending on the biomaterial_type:
'CL' for primary cell or primary cell culture, 'EFO' for cell line and 'UBERON' for primary tissue.
harmonized_cell_markers CD3+ CD4+ CD45RA+ CD3- CD19- CD56- Markers used to isolate and identify the cell type, when applicable.
automated_harmonized_sample_ontology CL UBERON EFO Extended only Automatic extraction from harmonized_sample_ontology_curie. The ontology corresponding to the curie, mostly used for other automatic extractions.
automated_harmonized_sample_ontology_term myeloid cell MCF 10A amygdala Extended only Automatic extraction from harmonized_sample_ontology_curie. The term corresponding to the curie, mostly used for detecting inconsistencies.
sample_ontology_term_high_order_JeffreyHyacinthe Cell Line Blood Extended only semi-manual annotation by Jeffrey Hyacinthe. Had been applied to v0.8
sample_ontology_term_high_order_JonathanSteif Breast Macrophage Extended only semi-manual annotation by Jonathan Steif. Had been applied to v0.9 draft
automated_harmonized_sample_ontology_term_intermediate_order_unique Extended only Automatic exctraction from harmonized_sample_ontology_curie, mostly used for harmonized_sample_ontology_intermediate
automated_harmonized_sample_ontology_term_high_order_unique Extended only Automatic exctraction from harmonized_sample_ontology_curie, mostly used for harmonized_sample_ontology_intermediate
automated_harmonized_sample_ontology_term_intermediate_order Extended only Automatic exctraction from harmonized_sample_ontology_curie, mostly used for harmonized_sample_ontology_intermediate
automated_harmonized_sample_ontology_term_high_order Extended only Automatic exctraction from harmonized_sample_ontology_curie, mostly used for harmonized_sample_ontology_intermediate
harmonized_sample_disease Breast Carcinoma Acute Promyelocytic Leukemia with PML-RARA This attribute reflects the disease for this particular sample, not the donor health condition.
harmonized_sample_disease_ontology_curie NCIM:C0678222 NCIM:C0023487 The CURIE identifying the NCIM disease ontology term.
automated_harmonized_sample_disease_ontology_curie_ncit NCIT:C41132 NCIT:C4872 Extended only Automatic exctraction from harmonized_sample_disease_ontology_curie, mostly used for other automatic extractions.
automated_harmonized_sample_disease_ontology_term_intermediate_order_unique Extended only Automatic exctraction from harmonized_sample_disease_ontology_curie, mostly used for harmonized_sample_disease_high and harmonized_sample_disease_high.
automated_harmonized_sample_disease_ontology_term_high_order_unique Extended only Automatic exctraction from harmonized_sample_disease_ontology_curie, mostly used for harmonized_sample_disease_high and harmonized_sample_disease_high.
automated_harmonized_sample_disease_ontology_term_intermediate_order Extended only Automatic exctraction from harmonized_sample_disease_ontology_curie, mostly used for harmonized_sample_disease_high and harmonized_sample_disease_high.
automated_harmonized_sample_disease_ontology_term_high_order Extended only Automatic exctraction from harmonized_sample_disease_ontology_curie, mostly used for harmonized_sample_disease_high and harmonized_sample_disease_high.
harmonized_donor_type Single donor Composite Pooled samples Composite is a reference generated from analysis objects generated from multiple individuals, ie H3K27ac ChIP-seq is subject A; RNA-seq is Subject B. Pooled samples are references generated from a biological pool, for example cord blood from 134 individual cords pooled together.
harmonized_donor_id CEMT0007 C07015 Identifier for donors within their projects.
harmonized_donor_age 60-65 unknown 46 Age of donor. Can be an interval.
harmonized_donor_age_unit year day week unknown Age unit of donor.
harmonized_donor_life_stage adult child embryonic fetal newborn postnatal unknown Life stage of donor.
harmonized_donor_sex female male mixed Sex of donor.
harmonized_donor_health_status Breast Carcinoma Acute Promyelocytic Leukemia with PML-RARA The health status of the donor that provided the sample. Does not describe the disease for this particular sample.
harmonized_donor_health_status_ontology_curie NCIM:C0023487 NCIM:C0678222 The CURIE identifying the NCIM donor health status ontology term.
automated_harmonized_donor_health_status_ontology_curie_ncit Extended only Automatic exctraction from harmonized_donor_health_status_ontology_curie, mostly used for other automatic extractions.
automated_harmonized_donor_health_status_ontology_term_intermediate_order_unique Extended only Automatic exctraction from harmonized_donor_health_status_ontology_curie.
automated_harmonized_donor_health_status_ontology_term_high_order_unique Extended only Automatic exctraction from harmonized_donor_health_status_ontology_curie.
automated_harmonized_donor_health_status_ontology_term_intermediate_order Extended only Automatic exctraction from harmonized_donor_health_status_ontology_curie.
automated_harmonized_donor_health_status_ontology_term_high_order Extended only Automatic exctraction from harmonized_donor_health_status_ontology_curie.
harmonized_donor_life_status dead alive Health state of donor: dead or alive.