-
Notifications
You must be signed in to change notification settings - Fork 19
Home
tMDataLoader is a tool developed by Clarivate Analytics (Formerly the IP & Science business of Thomson Reuters) in order to automate ETL processes for tranSMART. This is an open-source software, written in Groovy and available for download for both Oracle and PostgreSQL versions of tranSMART. tMDataLoader uses the same Stored Procedures as Kettle but some of them were modified.
tMDataLoader supports all tranSMART clinical and HDD - high dimensional molecular data types (with the exception of the NGS read count observation format, see RNASeq Data).
tMDataLoader includes options for loading, deleting, incremental loading and moving/renaming folders and nodes within the tree of a loaded study. Studies can be deleted by path or by ID. tMDataLoader does not automatically delete platforms associated with HDD data of a study being deleted because the same platform can be associated with multiple studies. If needed, it has to be deleted manually after verifying that other studies will to be affected.
First tranSMART public release supported Clinical and Gene Expression data. Gene Expression Dictionary is part of the standard tranSMART installation. Tables for other HDD data were added in subsequent releases. Dictionaries for these data types have to be loaded as an additional installation step before loading HDD data with tMDataLoader or any other ETL tool.
For the HDD Subject-Sample mapping file it is recommended that the input file columns are mapped to the following table columns (this is done so that the javascript for the advanced workflows selects the correct data for the dropdowns)
- tissue_type => sample_type
- attribute_1 => tissue_type
- atrribute_2 => timepoint