Skip to content
nboukharov edited this page Jan 13, 2017 · 28 revisions

#Welcome to the tMDataLoader wiki!

tMDataLoader is a tool developed by Clarivate Analytics (Formerly the IP & Science business of Thomson Reuters) in order to automate ETL processes for tranSMART. This is an open-source software, written in Groovy and available for download for both Oracle and PostgreSQL versions of tranSMART. tMDataLoader uses the same Stored Procedures as Kettle but some of them were modified.

tMDataLoader supports all tranSMART clinical and HDD - high dimensional molecular data types (with the exception of the NGS read count observation format, see RNASeq Data). tMDataLoader includes options for loading, deleting, incremental loading and moving/renaming folders and nodes within the tree of a loaded study. Studies can be deleted by path or by ID. tMDataLoader does not automatically delete platforms associated with HDD data of a study being deleted because the same platform can be associated with multiple studies. If needed, it has to be deleted manually after verifying that other studies will to be affected.
First tranSMART public release supported Clinical and Gene Expression data. Gene Expression Dictionary is part of the standard tranSMART installation. Tables for other HDD data were added in subsequent releases. Dictionaries for these data types have to be loaded as an additional installation step before loading HDD data with tMDataLoader or any other ETL tool.

For the HDD Subject-Sample mapping file it is recommended that the input file columns are mapped to the following table columns (this is done so that the javascript for the advanced workflows selects the correct data for the dropdowns)

  • tissue_type => sample_type
  • attribute_1 => tissue_type
  • atrribute_2 => timepoint

For Postgres

  • study_id character varying(25)
  • site_id character varying(50)
  • subject_id character varying(100)
  • visit_name character varying(100)
  • data_label character varying(500)
  • data_value character varying(500)
  • category_cd character varying(250)
  • category_path character varying(1000)

For Oracle

  • STUDY_ID VARCHAR2(25 BYTE)
  • SITE_ID VARCHAR2(50 BYTE) SUBJECT_ID VARCHAR2(100 BYTE)
  • VISIT_NAME VARCHAR2(100 BYTE)
  • DATA_LABEL VARCHAR2(500 BYTE)
  • DATA_VALUE VARCHAR2(500 BYTE)
  • CATEGORY_CD VARCHAR2(250 BYTE)
  • ETL_JOB_ID NUMBER(220)
  • ETL_DATE DATE
  • USUBJID VARCHAR2(200 BYTE)
  • CATEGORY_PATH VARCHAR2(1000 BYTE)