Releases: IHEC/epiATLAS-metadata-harmonization
Version 1.3
Version 1.3
At this time, this repository is only for sample metadata, not experiment metadata.
For more information about experiment metadata check out the IHEC Data Portal and EpiATLAS.
You can also find this metadata on EpiRR.
There is metadata available for 2279 EpiRR entries.
The CSV for the sample metadata can be found at openrefine/v1.3/IHEC_metadata_harmonization.v1.3.csv and the extended version at openrefine/v1.3/IHEC_metadata_harmonization.v1.3.extended.csv
News
- New column
harmonized_sample_label
which is a sample label based on sample ontology and sample disease using common terms that might connect multiple ontologies or columns by Martin Hirst. - Ordering of rows is now based on the following columns (
harmonized_sample_ontology_term_high_order_fig1
andharmonized_sample_ontology_intermediate
ordered manually; age sorted as double; other columns sorted ignoring case) in this order:harmonized_sample_ontology_term_high_order_fig1
,harmonized_sample_ontology_intermediate
,harmonized_sample_label
,harmonized_sample_disease_high
,harmonized_sample_disease_intermediate
,harmonized_donor_sex
,automated_harmonized_donor_age_in_years
, andEpiRR
. - Some changes in
harmonized_sample_ontology_intermediate
. - Fixed
harmonized_donor_life_stage
for 5 entries. - Extended version:
harmonized_sample_ontology_term_high_order_fig1_color
contains a coloring for each value inharmonized_sample_ontology_term_high_order_fig1
. - Extended version:
harmonized_sample_ontology_intermediate_color
contains a coloring forharmonized_sample_ontology_intermediate
. - Extended version: In addition to the columns
harmonized_donor_sex
andharmonized_donor_life_stage
that have been complemented and corrected, based on the high confidence predictions of the EpiClass tool, the extended version now contains the columns without these corrections, i.e.,${column}_uncorrected
. - Extended version: The columns containing information about whether data is available have been renamed to contain the assay name, e.g.,
automated_experiments_ChIP-Seq_H3K27ac
. WGBS and RNA-Seq columns have been separated by PBAT vs. standard and mRNA-Seq vs. total-RNA-Seq.
Raw Files
In case you are interested in the raw files that the harmonization process was based on, those can be found at raw/EpiAtlas_EpiRR_metadata_all.csv.
Note that they contain different columns, as they changed during the harmonization process.
Diff
The overall diff between v1.2 and v1.3 can be found at openrefine/v1.3/diff_v1.2_v1.3.json
Metadata Standard
Please keep in mind that we try to stay as close to the IHEC Metadata Standard as possible.
Column descriptions:
The table below describes the columns included in the metadata table and the extended metadata table.
Column | Examples | Explanation | # Not Null (%) |
---|---|---|---|
EpiRR | IHECRE00000001.4 |
EpiRR identifier. The number behind the dot (.) is the version. | 2279 (100.0%) |
project | CEEHRC BLUEPRINT |
The project from which the epigenome originated. | 2279 (100.0%) |
harmonized_biomaterial_type | cell line primary cell primary cell culture primary tissue |
One of primary cell ,primary cell culture , cell line , primary tissue . |
2279 (100.0%) |
harmonized_sample_label | B Lymphocyte Acute Lymphoblastic Leukemia |
Sample label based on sample ontology and sample disease using common terms that might connect multiple ontologies or columns by Martin Hirst. | 2279 (100.0%) |
harmonized_sample_ontology_intermediate | T cell epithelial cell derived cell line |
A manually refined higher level annotation describing the samples using ancestors in the ontology. | 2279 (100.0%) |
harmonized_sample_ontology_intermediate_color | "143,81,121" |
Extended only A unique color for each unique entry in harmonized_sample_ontology_intermediate . |
2246 (98.6%) |
harmonized_sample_disease_high | Healthy/None Cancer Disease |
A manually refined higher level annotation describing the disease using only three categories: Healthy/None, Cancer, Disease. | 2279 (100.0%) |
harmonized_sample_disease_intermediate | Carcinoma Leukemia |
A manually refined higher level annotation describing the disease for this particular sample using ancestors in the NCIT ontology. NCIM CURIEs were mapped to NCIT CURIES, see version 0.9 for explanation. | 2279 (100.0%) |
harmonized_EpiRR_status | Complete Partial |
Whether this epigenome is Complete or Partial . |
2279 (100.0%) |
epiATLAS_status | Complete Partial Complete_imputed |
Equivalent to harmonized_EpiRR_status but referring to the reprocessed data rather than original submitted data, describing the status of the reference epigenome with the additional information of full epigenomes when using imputed data. |
2279 (100.0%) |
harmonized_cell_type | myeloid cell effector memory CD8-positive, alpha-beta T cell |
The cell type and main sample ontology classification for entries where biomaterial_type is primary cell or primary cell culture . |
1561 (68.5%) |
harmonized_cell_line | MCF 10A |
The cell line and main sample ontology classification for entries where biomaterial_type is cell line . |
73 (3.2%) |
harmonized_tissue_type | skeletal muscle tissue amygdala |
... |
Version 1.3
Version 1.3
At this time, this repository is only for sample metadata, not experiment metadata.
For more information about experiment metadata check out the IHEC Data Portal and EpiATLAS.
You can also find this metadata on EpiRR.
There is metadata available for 2279 EpiRR entries.
The CSV for the sample metadata can be found at openrefine/v1.3/IHEC_metadata_harmonization.v1.3.csv and the extended version at openrefine/v1.3/IHEC_metadata_harmonization.v1.3.extended.csv
News
- Fixed
harmonized_donor_life_stage
for 5 entries. - Extended version:
harmonized_sample_ontology_intermediate_color
contains a coloring for each value inharmonized_sample_ontology_intermediate
. - Extended version: In addition to the columns
harmonized_donor_sex
andharmonized_donor_life_stage
that have been complemented and corrected, based on the high confidence predictions of the EpiClass tool, the extended version now contains the columns without these corrections, i.e.,${column}_uncorrected
. - Extended version: The columns containing information about whether data is available have been renamed to contain the assay name, e.g.,
automated_experiments_ChIP-Seq_H3K27ac
. WGBS and RNA-Seq columns have been separated by PBAT vs. standard and mRNA-Seq vs. total-RNA-Seq.
Raw Files
In case you are interested in the raw files that the harmonization process was based on, those can be found at raw/EpiAtlas_EpiRR_metadata_all.csv.
Note that they contain different columns, as they changed during the harmonization process.
Diff
The overall diff between v1.2 and v1.3 can be found at openrefine/v1.3/diff_v1.2_v1.3.json
Metadata Standard
Please keep in mind that we try to stay as close to the IHEC Metadata Standard as possible.
Column descriptions:
The table below describes the columns included in the metadata table and the extended metadata table.
Column | Examples | Explanation | # Not Null (%) |
---|---|---|---|
EpiRR | IHECRE00000001.4 |
EpiRR identifier. The number behind the dot (.) is the version. | 2279 (100.0%) |
project | CEEHRC BLUEPRINT |
The project from which the epigenome originated. | 2279 (100.0%) |
harmonized_biomaterial_type | cell line primary cell primary cell culture primary tissue |
One of primary cell ,primary cell culture , cell line , primary tissue . |
2279 (100.0%) |
harmonized_sample_ontology_intermediate | T cell epithelial cell derived cell line |
A manually refined higher level annotation describing the samples using ancestors in the ontology. | 2279 (100.0%) |
harmonized_sample_ontology_intermediate_color | 182,26,57 |
Extended only A unique color for each unique entry in harmonized_sample_ontology_intermediate . |
2279 (100.0%) |
harmonized_sample_disease_high | Healthy/None Cancer Disease |
A manually refined higher level annotation describing the disease using only three categories: Healthy/None, Cancer, Disease. | 2279 (100.0%) |
harmonized_sample_disease_intermediate | Carcinoma Leukemia |
A manually refined higher level annotation describing the disease for this particular sample using ancestors in the NCIT ontology. NCIM CURIEs were mapped to NCIT CURIES, see version 0.9 for explanation. | 2279 (100.0%) |
harmonized_EpiRR_status | Complete Partial |
Whether this epigenome is Complete or Partial . |
2279 (100.0%) |
epiATLAS_status | Complete Partial Complete_imputed |
Equivalent to harmonized_EpiRR_status but referring to the reprocessed data rather than original submitted data, describing the status of the reference epigenome with the additional information of full epigenomes when using imputed data. |
2279 (100.0%) |
harmonized_cell_type | myeloid cell effector memory CD8-positive, alpha-beta T cell |
The cell type and main sample ontology classification for entries where biomaterial_type is primary cell or primary cell culture . |
1561 (68.5%) |
harmonized_cell_line | MCF 10A |
The cell line and main sample ontology classification for entries where biomaterial_type is cell line . |
73 (3.2%) |
harmonized_tissue_type | skeletal muscle tissue amygdala |
The cell line and main sample ontology classification for entries where biomaterial_type is primary tissue . |
2008 (88.1%) |
harmonized_sample_ontology_curie | CL:0000990 UBERON:0001876 EFO:0001200 |
The CURIE identifying the sample ontology term. Different ontologies are used, depending on the biomaterial_type : 'CL' for primary cell or primary cell culture , 'EFO' for cell line and 'UBERON' for primary tissue . |
2279 (100.0%) |
harmonized_cell_markers | CD3+ CD4+ CD45RA+ CD3- CD19- CD56- |
Markers used to isolate and identify the cell type, when applicable. | 1144 (50.2%) |
automated_harmonized_sample_ontology | CL UBERON EFO ... |
Version 1.2
Version 1.2
At this time, this repository is only for sample metadata, not experiment metadata.
There is metadata available for 2279 EpiRR entries.
The CSV for the sample metadata can be found
at openrefine/v1.2/IHEC_metadata_harmonization.v1.2.csv and the extended version at openrefine/v1.2/IHEC_metadata_harmonization.v1.2.extended.csv
News
- Added 63 entries that had erroneously been removed in v1.1.
- The columns
harmonized_donor_sex
andharmonized_donor_life_stage
have been complemented and corrected, based on
the prediction of the EpiClass tool. For more information on this, please contact Pierre-Étienne Jacques. - Some minor changes to
sample_disease
anddonor_health_status
columns. - Added column
epiATLAS_status
which is equivalent toharmonized_EpiRR_status
but referring to the reprocessed data
rather than original submitted data, describing the status of the reference epigenome with the additional information
of full epigenomes when using imputed data. - Extended version: Added columns for each assay type (histone marks, wgbs, and
rna-seq)automated_experiments_${assay}
containing the uuid for observed data, orimputed
if only imputed data is
available. - Extended version: Added column
harmonized_sample_ontology_term_high_order_fig1
- Extended version: Columns
sample_ontology_term_high_order_JeffreyHyacinthe
andsample_ontology_term_high_order_JonathanSteif
have been removed and replaced
byharmonized_sample_ontology_term_high_order_fig1
containing the sample labels corresponding to the annotations in
the overview figure. - Extended version: Added columns
harmonized_sample_[...]_order_AnetaMikulasova
containing manually assigned
labels by Aneta
Mikulasova, which contain information about organ, cell, and cancer (sub-)types. - Extended version: Removed columns
automated_harmonized_($column)_($order)(_unique)?
,
e.g.,automated_harmonized_sample_ontology_term_intermediate_order_unique
containing the automatic extraction higher
order as decribed in v0.9. These columns
were used to derive theharmonized_sample_ontology_intermediate
andharmonized_sample_disease_intermediate
columns, but this was based on older versions of these columns. The columns are still generated internally, for
checking purposes, but could confuse users and are not necessary for the metadata.
Diff
The overall diff between v1.1 and v1.2 can be found at openrefine/v1.2/diff_v1.1_v1.2.json
Metadata Standard
Please keep in mind that we try to stay as close to the IHEC Metadata Standard as possible.
Column descriptions:
The table below describes the columns included in the metadata table and the extended metadata table.
Column | Examples | Explanation | # Not Null (%) |
---|---|---|---|
EpiRR | IHECRE00000001.4 |
EpiRR identifier. The number behind the dot (.) is the version. | 2279 (100.0%) |
project | CEEHRC BLUEPRINT |
The project from which the epigenome originated. | 2279 (100.0%) |
harmonized_biomaterial_type | cell line primary cell primary cell culture primary tissue |
One of primary cell ,primary cell culture , cell line , primary tissue . |
2279 (100.0%) |
harmonized_sample_ontology_intermediate | T cell epithelial cell derived cell line |
A manually refined higher level annotation describing the samples using ancestors in the ontology. | 2279 (100.0%) |
harmonized_sample_disease_high | Healthy/None Cancer Disease |
A manually refined higher level annotation describing the disease using only three categories: Healthy/None, Cancer, Disease. | 2279 (100.0%) |
harmonized_sample_disease_intermediate | Carcinoma Leukemia |
A manually refined higher level annotation describing the disease for this particular sample using ancestors in the NCIT ontology. NCIM CURIEs were mapped to NCIT CURIES, see version 0.9 for explanation. | 2279 (100.0%) |
harmonized_EpiRR_status | Complete Partial |
Whether this epigenome is Complete or Partial . |
2279 (100.0%) |
epiATLAS_status | Complete Partial Complete_imputed |
Equivalent to harmonized_EpiRR_status but referring to the reprocessed data rather than original submitted data, describing the status of the reference epigenome with the additional information of full epigenomes when using imputed data. |
2279 (100.0%) |
harmonized_cell_type | myeloid cell effector memory CD8-positive, alpha-beta T cell |
The cell type and main sample ontology classification for entries where biomaterial_type is primary cell or primary cell culture . |
1561 (68.5%) |
harmonized_cell_line | MCF 10A |
The cell line and main sample ontology classification for entries where biomaterial_type is cell line . |
73 (3.2%) |
harmonized_tissue_type | skeletal muscle tissue amygdala |
The cell line and main sample ontology classification for entries where biomaterial_type is primary tissue . |
2008 (88.1%) |
harmonized_sample_ontology_curie | CL:0000990 UBERON:0001876 EFO:0001200 |
The CURIE identifying the sample ontology term. Different ontologies are used, depending on the biomaterial_type : 'CL' for primary cell or primary cell culture , 'EFO' for cell line and 'UBERON' for primary tissue . |
2279 (100.0%) |
harmonized_cell_markers | CD3+ CD4+ CD45RA+ CD3- CD19- CD56- ... |
Version 1.1
Version 1.1
At this time, this repository is only for sample metadata, not experiment metadata.
There is metadata available for 2216 EpiRR entries.
The CSV for the sample metadata can be found at openrefine/v1.1/IHEC_metadata_harmonization.v1.1.csv
Based on tag v1.1.1 because of a change in column order of the extended version.
News
- Removed entries if no corresponding datasets were reprocessed or all datasets corresponding to an EpiRR entry were pruned.
- Removed column
harmonized_donor_life_status
which doesn't contain any information after some entries have been removed (see above). - Added column
epirr_id_without_version
for natural joins with the epimap_metadata.csv which provides metadata about the reprossed datasets #85. - Extended version: Added column
automated_harmonized_donor_age_in_years
based onharmonized_donor_age
as explained in #86.
- Intervals are split by
-
and the mean is computed.- Values with 'week' or 'day' as
harmonized_donor_age_unit
are divided by 52 or 365, respectively.
Note: unknown is converted to nan and 90+ is just converted to 90
Diff
The overall diff between v1.0 and v1.1 can be found at openrefine/v1.1/diff_v1.0_v1.1.json
Extended Version:
For more information on the columns from the extended version at openrefine/v1.1/IHEC_metadata_harmonization.v1.1.extended.csv, please also see version 0.9.
Metadata Standard
Please keep in mind that we try to stay as close to the IHEC Metadata Standard as possible.
Column descriptions:
The table below describes the columns included in the metadata table at IHEC_metadata_harmonization.v1.1.csv and the extended metadata table at IHEC_metadata_harmonization.v1.1.extended.csv.
Column | Examples | Explanation | # Not Null (%) |
---|---|---|---|
EpiRR | IHECRE00000001.4 |
EpiRR identifier. The number behind the dot (.) is the version. | 2216 (100.0%) |
project | CEEHRC BLUEPRINT |
The project from which the epigenome originated. | 2216 (100.0%) |
harmonized_biomaterial_type | cell line primary cell primary cell culture primary tissue |
One of primary cell ,primary cell culture , cell line , primary tissue . |
2216 (100.0%) |
harmonized_sample_ontology_intermediate | T cell epithelial cell derived cell line |
A manually refined higher level annotation describing the samples using ancestors in the ontology. | 2216 (100.0%) |
harmonized_sample_disease_high | Healthy/None Cancer Disease |
A manually refined higher level annotation describing the disease using only three categories: Healthy/None, Cancer, Disease. | 2216 (100.0%) |
harmonized_sample_disease_intermediate | Carcinoma Leukemia |
A manually refined higher level annotation describing the disease for this particular sample using ancestors in the NCIT ontology. NCIM CURIEs were mapped to NCIT CURIES, see version 0.9 for explanation. | 2216 (100.0%) |
harmonized_EpiRR_status | Complete Partial |
Whether this epigenome is Complete or Partial . |
2216 (100.0%) |
harmonized_cell_type | myeloid cell effector memory CD8-positive, alpha-beta T cell |
The cell type and main sample ontology classification for entries where biomaterial_type is primary cell or primary cell culture . |
1498 (67.6%) |
harmonized_cell_line | MCF 10A |
The cell line and main sample ontology classification for entries where biomaterial_type is cell line . |
73 (3.3%) |
harmonized_tissue_type | skeletal muscle tissue amygdala |
The cell line and main sample ontology classification for entries where biomaterial_type is primary tissue . |
1958 (88.4%) |
harmonized_sample_ontology_curie | CL:0000990 UBERON:0001876 EFO:0001200 |
The CURIE identifying the sample ontology term. Different ontologies are used, depending on the biomaterial_type : 'CL' for primary cell or primary cell culture , 'EFO' for cell line and 'UBERON' for primary tissue . |
2216 (100.0%) |
harmonized_cell_markers | CD3+ CD4+ CD45RA+ CD3- CD19- CD56- |
Markers used to isolate and identify the cell type, when applicable. | 1082 (48.8%) |
automated_harmonized_sample_ontology | CL UBERON EFO |
Extended only Automatic extraction from harmonized_sample_ontology_curie . The ontology corresponding to the curie, mostly used for other automatic extractions. |
2216 (100.0%) |
automated_harmonized_sample_ontology_term | myeloid cell MCF 10A amygdala |
Extended only Automatic extraction from harmonized_sample_ontology_curie . The term corres... |
Version 1.0
Version 1.0
At this time, this repository is only for sample metadata, not experiment metadata.
The CSV for the sample metadata can be found at openrefine/v1.0/IHEC_metadata_harmonization.v1.0.csv
News
- The prefix
harm
has been renamed toharmonized
for all columns where at least one cell was changed compared to the original data from EpiRR. - The prefix
automated
was added afterward for all columns that are generated completely automatically and lack manual curation. They are available in the extended version only. - The column originally called
line
has been renamed tocell_line
, i.e., nowharmonized_cell_line
. - The column originally called
markers
has been renamed tocell_markers
, i.e., nowharmonized_cell_markers
. - In all columns originally containing
disease
it has been renamed tosample_disease
, to emphasize that this attribute reflects the disease for this particular sample, not the donor health condition.
Diff
The overall diff between v0.11 and v1.0 can be found at diff_v0.11_v1.0.json
Extended Version:
For more information on the columns from the extended version at IHEC_metadata_harmonization.v1.0.extended.csv, please also see version 0.9.
Metadata Standard
Please keep in mind that we try to stay as close to
the IHEC Metadata Standard
as possible.
Column descriptions:
The table below describes the columns included in the metadata table
at IHEC_metadata_harmonization.v1.0.csv and the extended metadata table
at IHEC_metadata_harmonization.v1.0.extended.csv.
Column | Examples | Explanation |
---|---|---|
EpiRR | IHECRE00000001.4 |
EpiRR identifier. The number behind the dot (.) is the version. |
project | CEEHRC BLUEPRINT |
The project from which the epigenome originated. |
harmonized_biomaterial_type | cell line primary cell primary cell culture primary tissue |
One of primary cell ,primary cell culture , cell line , primary tissue . |
harmonized_sample_ontology_intermediate | T cell epithelial cell derived cell line |
A manually refined higher level annotation describing the samples using ancestors in the ontology. |
harmonized_sample_disease_high | Healthy/None Cancer Disease |
A manually refined higher level annotation describing the disease using only three categories: Healthy/None, Cancer, Disease. |
harmonized_sample_disease_intermediate | Carcinoma Leukemia |
A manually refined higher level annotation describing the disease for this particular sample using ancestors in the NCIT ontology. NCIM CURIEs were mapped to NCIT CURIES, see version 0.9 for explanation. |
harmonized_EpiRR_status | Complete Partial |
Whether this epigenome is Complete or Partial . |
harmonized_cell_type | myeloid cell effector memory CD8-positive, alpha-beta T cell |
The cell type and main sample ontology classification for entries where biomaterial_type is primary cell or primary cell culture . |
harmonized_cell_line | MCF 10A |
The cell line and main sample ontology classification for entries where biomaterial_type is cell line . |
harmonized_tissue_type | skeletal muscle tissue amygdala |
The cell line and main sample ontology classification for entries where biomaterial_type is primary tissue . |
harmonized_sample_ontology_curie | CL:0000990 UBERON:0001876 EFO:0001200 |
The CURIE identifying the sample ontology term. Different ontologies are used, depending on the biomaterial_type :'CL' for primary cell or primary cell culture , 'EFO' for cell line and 'UBERON' for primary tissue . |
harmonized_cell_markers | CD3+ CD4+ CD45RA+ CD3- CD19- CD56- |
Markers used to isolate and identify the cell type, when applicable. |
automated_harmonized_sample_ontology | CL UBERON EFO |
Extended only Automatic extraction from harmonized_sample_ontology_curie . The ontology corresponding to the curie, mostly used for other automatic extractions. |
automated_harmonized_sample_ontology_term | myeloid cell MCF 10A amygdala |
Extended only Automatic extraction from harmonized_sample_ontology_curie . The term corresponding to the curie, mostly used for detecting inconsistencies. |
sample_ontology_term_high_order_JeffreyHyacinthe | Cell Line Blood |
Extended only semi-manual annotation by Jeffrey Hyacinthe. Had been applied to v0.8 ... |
Version 0.11
Version 0.11
The CSV for the metadata can be found at openrefine/v0.11/IHEC_metadata_harmonization.v0.11.csv
News
For all columns that were changed, i.e., “harmonized” in this effort, we added the prefix harm_
to clearly mark that this column has been changed in comparison to the original EpiRR data.
For the columns describing manual annotations, we removed the suffix _order_manual
. They can still be distinguished from the automatically extracted higher order annotations in the extended version, because these later columns still have the suffix _order
or _order_unique
.
In this version, we rearranged the column order, such that it made more sense to us:
- First, six columns that describe the most important information about the entry:
EpiRR
,project
,harm_biomaterial_type
,harm_sample_ontology_intermediate
,harm_disease_high
,harm_disease_intermediate
, - Next, the EpiRR_status and the columns describing the sample ontology (cell type, cell line or tissue)
EpiRR_status
,harm_cell_type
,harm_line
,harm_tissue_type
,harm_sample_ontology_curie
,harm_markers
, - Afterwards, two columns stating the disease of this particular sample
harm_disease
,harm_disease_ontology_curie
- Lastly, nine columns with information about the donor(s) of this sample
donor_type
,harm_donor_id
,harm_donor_age
,harm_donor_age_unit
,harm_donor_life_stage
,harm_donor_sex
,harm_donor_health_status
,harm_donor_health_status_ontology_curie
,harm_donor_life_status
Additionally, we added the donor_type column, which describes whether the reference epigenome is from Single donor
, Composite
or Pooled samples
. This information was downloaded from EpiRR directly.
Diff
The overall diff between v0.10 and v0.11 can be found at openrefine/v0.11/diff_v0.10_v0.11.json
Explanations
For a table that describes the columns included in the metadata table, please refer to version 0.10
Extended Version
For explanations concerning the extended version, openrefine/v0.11/IHEC_metadata_harmonization.v0.11.extended.csv, please see version 0.9.
Metadata Standard
Please always keep in mind that we try to stay as close to the IHEC Metadata Standard as possible.
Version 0.10
The CSV for the metadata can be found at openrefine/v0.10/IHEC_metadata_harmonization.v0.10.csv
The overall diff between v0.9 and v0.10 can be found at openrefine/v0.10/diff_v0.9_v0.10.json
The table below describes the columns included in the metadata table.
For explanations concerning the extended version, please see version 0.9.
Please always keep in mind that we try to stay as close to the IHEC Metadata Standard as possible.
Column | Examples | Explanation |
---|---|---|
EpiRR | IHECRE00000001.4 |
EpiRR identifier. The number behind the dot (.) is the version. |
EpiRR_status | Complete Partial |
Whether this epigenome is Complete or Partial . |
project | CEEHRC BLUEPRINT |
The project from which the epigenome originated. |
biomaterial_type | cell line primary cell primary cell culture primary tissue |
One of primary cell ,primary cell culture , cell line , primary tissue . |
cell_type | myeloid cell effector memory CD8-positive, alpha-beta T cell |
The cell type and main sample ontology classification for entries where biomaterial_type is primary cell or primary cell culture . |
line | MCF 10A |
The cell line and main sample ontology classification for entries where biomaterial_type is cell line . |
tissue_type | skeletal muscle tissue amygdala |
The cell line and main sample ontology classification for entries where biomaterial_type is primary tissue . |
sample_ontology_curie | CL:0000990 UBERON:0001876 EFO:0001200 |
The CURIE identifying the sample ontology term. Different ontologies are used, depending on the biomaterial_type :'CL' for primary cell or primary cell culture , 'EFO' for cell line and 'UBERON' for primary tissue . |
sample_ontology_term_high_order_manual | other T cell |
A manually refined higher level annotation describing the samples using ancestors in the ontology. |
markers | CD3+ CD4+ CD45RA+ CD3- CD19- CD56- |
Markers used to isolate and identify the cell type, when applicable. |
disease | Breast Carcinoma Acute Promyelocytic Leukemia with PML-RARA |
This attribute reflects the disease for this particular sample, not the donor health condition. |
disease_ontology_curie | NCIM:C0678222 NCIM:C0023487 |
The CURIE identifying the NCIM disease ontology term. |
disease_high_order_manual | Healthy/None Cancer Disease |
A manually refined higher level annotation describing the diseases using only three categories: Healthy/None, Cancer, Disease. |
disease_intermediate_order_manual | Carcinoma Leukemia |
A manually refined higher level annotation describing the disease for this particular sample using ancestors in the NCIT ontology. NCIM CURIEs were mapped to NCIT CURIES, see version 0.9 for explanation. |
donor_id | CEMT0007 C07015 |
Identifier for donors within their projects. |
donor_age | 60-65 unknown 46 |
Age of donor. Can be an interval. |
donor_age_unit | year day |
Age unit of donor. |
donor_life_stage | embryonic adult |
Life stage of donor. |
sex | female male |
Sex of donor. |
donor_health_status | Breast Carcinoma Acute Promyelocytic Leukemia with PML-RARA |
Links to the health status of the donor that provided the sample. Does not describe the disease for this particular sample. |
donor_health_status_ontology_curie | NCIM:C0023487 NCIM:C0678222 |
The CURIE identifying the NCIM donor health status ontology term. |
health_state | dead alive |
Health state of donor: dead or alive . |
Version 0.9
The CSV for the metadata can be found at openrefine/v0.9/IHEC_metadata_harmonization.v0.0.csv
The overall diff between v0.8 and v0.9 can be found at openrefine/v0.9/diff_v0.8_v0.9.json
This version comes with the first “extended” version openrefine/v0.9/IHEC_metadata_harmonization.v0.9.extended.csv that includes higher level annotations for the three ontology columns.
The following columns have been added in comparison to the normal v0.9:
donor_health_status_ontology_curie_ncit
: mapping from NCIM to NCIT curies for the donor_health_status_ontology_curiedisease_ontology_curie_ncit
: mapping from NCIM to NCIT curies for the disease_ontology_curiesample_ontology
: ontology to use based on thebiomaterial_type
sample_ontology_term
: the ontology term extracted fromdisease_ontology_curie
that should reflect eitherline
,tissue_type
orcell_type
, depending on thesample_ontology
sample_ontology_term_high_order_JeffreyHyacinthe
: semi-manual annotation by Jeffrey Hyacinthe. Had been applied to v0.8sample_ontology_term_high_order_JonathanSteif
: semi-manual annotation by Jonathan Steif. Had been applied to v0.9 draftsample_ontology_term_high_order_manual
: semi-manual annotation using the automatic extraction columns below and the manual annotation above. Created by some members of the IHEC IA metadata group (Pierre-Etienne Jacques, Gabriella Frosi and Quirin Manz). Had been applied to v0.9. Although this is the current higher level annotation forsample_ontology_term
, it should be handled with caution, since it's still preliminary and should be checked by others.
Note that the sample_ontology_term
columns were grouped by their sample_ontology
in the automatic extraction.
The following columns are a result of the automatic extraction:
($column)_($order)(_unique)?
:
$column
describes the ontology column that the automatic extraction was performed on. One of[sample_ontology_term, donor_health_status_ontology_term, disease_ontology_term]
$order
describes the number of unique terms that are overall allowed in the column (or group forsample_ontology_term
). Forintermediate_order
the maximum number of terms is 30, forhigh_order
it is 15
_unique
suffix is attached if the automatic extraction considered only unique terms for counting before the automatic extraction. If not attached, the extraction was performed on all entries and duplicates were counted as well. This basically reflects the underlying dataset in which the extraction was performed, allowing duplicates or not.
This results in the following 12 additional columns:sample_ontology_term_intermediate_order_unique
:sample_ontology_term_high_order_unique
:sample_ontology_term_intermediate_order
:sample_ontology_term_high_order
:donor_health_status_ontology_term_intermediate_order_unique
:donor_health_status_ontology_term_high_order_unique
:donor_health_status_ontology_term_intermediate_order
:donor_health_status_ontology_term_high_order
:disease_ontology_term_intermediate_order_unique
:disease_ontology_term_high_order_unique
:disease_ontology_term_intermediate_order
:disease_ontology_term_high_order
:
Version 0.8
The CSV for the metadata can be found at openrefine/v0.8/IHEC_metadata_harmonization.v0.8.csv
This version includes significant changes to the structure of the table:
- Renamed columns according to metadata standard:
sample_ontology_term
->sample_ontology_curie
- Split the previously merged information for donor_health_status and disease into overall 4 columns:
donor_health_status
is split indonor_health_status
anddisease
disease_ontology_term
is split indonor_health_status_ontology_curie
anddisease_ontology_curie
- Removed entries not associated with humans and dropping the
taxon_id
column
find a list of removed ids in openrefine/v0.8/removed_entries.csv
The overall diff between v0.7 and v0.8 can be found at openrefine/v0.8/diff_v0.7_v0.8.json
Version 0.7
The CSV for the metadata can be found at openrefine/v0.7/IHEC_metadata_harmonization.v0.7.csv (permanent link)