Skip to content

05. RNASeq and miRNASeq Data

Natalia edited this page Mar 30, 2017 · 23 revisions

There are three tables in tranSMART for NGS data, two for RNA Sequencing data and one for miRNA Sequencing data. One of the RNASeq Data Tables is intended for loading raw read count observation data which can be used with Group Test for RNASeq Advanced Workflow. R package behind this workflow (EdgeR) includes normalization step. I believe this RNASeq special case data loading is not part of the original set of HDD data developed by Sanofi. Perhaps this was developed for a customer and then contributed by Hyve. At this time, tMDataLoader does not have a specific procedure to load RNASeq data in a raw read count observation format. All other Analysis Workflows require pre-normalized RNASeq data (RPKM, FPKM, TPM, etc.) which is loaded similar to Expression data. Even though there is a special table for miRNA sequencing data, it can also be loaded as RNASeq. This is mostly a matter of preference for standard IDs. For RNASeq data “probes” which in this case are transcript IDs are mapped to Standard Gene Symbols. For miRNA sequencing data miRBase symbols are used as standard IDs.

Normalized RNASeq Data Loading Instruction

RNASeq Data (sample)

RNASeq Data is loaded from the RNASeqDataToUpload Directory similar to the Expression Data. For more details see “Expression Data”

RNASeq Data File

TranscriptID S57023 S57024
NM_001011874 0 0.0093
NM_001195662 0.0384 0.051

Data Types

The last symbol in data file name (before extension) is one of following letters:
R - raw data. Values are loaded into Value column. Raw values are transformed to calculate log2 value and z-score. log2 values are loaded into Log values column
L - log2 data. Values are loaded into Log value column, raw values are restored and loaded into Value column. z-score is calculated.
T and Z - z-score data. Has same meaning, value will be written to z-score without modifications if it in range of (-2.5; 2.5). It will be truncated to this range otherwise.

NOTE: if data is loaded as R, 0.001 is added to all values before log2 transformation to avoid dropping 0 values (0 can't be log transformed). Normalized Raw RNA Seq data are not expected to have any negative values.

RNASeq Platform File

#PLATFORM_ID: RNASeq999 #PLATFORM_TITLE: Test RNASeq Platform #SPECIES: Homo Sapiens

Transcript ID Gene Symbol Organism
NM_001011874 MEF2C Homo sapiens
NM_001195662 ALDH8A1 Homo sapiens
NM_011283 MEF2A Homo sapiens
NM_011441 MEF2C Homo sapiens

RNASeq Subject Sample Mapping File

STUDY_ID SITE_ID SUBJECT_ID SAMPLE_ID PLATFORM TISSUETYPE ATTR1 ATTR2 CATEGORY_CD SOURCE_CD
GSE_A_37424 0 1 S57023 RNASeq999 Intestine Biomarker_Data+PLATFORM+TISSUETYPE STD

#Normalized miRNASeq Data Loading Instruction

RNASeq Data is loaded from the MIRNA_SEQDataToUpload Directory similar to Expression Data. For more details see “Expression Data”.

miRNASeq Data (sample)

miRNASeq Data File

ID_REF GSM918942 GSM918943 GSM918944 GSM918945 GSM918946 GSM918947 GSM918948 GSM918949
1 0.002908561 0.004549935 0.021626957 0.015697885 0.005178485 0.00498247 0.005311656 0.010319512
2 0.01039278 0.010017933 0.038167632 0.040012373 0.010484615 0.010744884 0.011629359 0.023468306
3 0.006034899 0.010552801 0.035375773 0.027333613 0.007408354 0.00969822 0.0095548 0.015315651

Data Types

The last symbol in data file name (before extension) is one of following letters:
R - raw data. Values are loaded into Value column. Raw values are transformed to calculate log2 value and z-score. log2 values are loaded into Log values column
L - log2 data. Values are loaded into Log value column, raw values are restored and loaded into Value column. z-score is calculated.
T and Z - z-score data. Has same meaning, value will be written to z-score without modifications if it in range of (-2.5; 2.5). It will be truncated to this range otherwise.

NOTE: if data is loaded as R, 0.001 is added to all values before log2 transformation to avoid dropping 0 values (0 can't be log transformed). Normalized miRNA Seq Raw data are not expected to have any negative values.

miRNASeq Platform File

#PLATFORM_ID: GPL15467seqbased #PLATFORM_TITLE: Test MIRNAseq Platform #SPECIES: Homo Sapiens

ID_REF MIRNA_ID SN_ID PLT_NAME ORGANISM
1 hsa-miR-1 GPL15467seqbased Homo Sapiens
2 hsa-miR-9 GPL15467seqbased Homo Sapiens
3 hsa-miR-10a GPL15467seqbased Homo Sapiens

miRNASeq Subject Sample Mapping File

STUDY_ID SITE_ID SUBJECT_ID SAMPLE_CD PLATFORM TISSUETYPE ATTRITBUTE_1 ATTRITBUTE_2 CATEGORY_CD SOURCE_CD
mirnaseqbased GSM918942 GSM918942 GPL15467seqbased Human Synovium Biomarker_Data+PLATFORM+ATTR1 STD