Skip to content

2 Input files

johaGL edited this page Mar 18, 2024 · 10 revisions

Input files for TraceGroomer


Compulsory files

The user will find here how to get examples of the compulsory files, these are downloadable from Zenodo.

The measurements file (tsv, or xlsx):

Description
This is the file that you obtained after correcting for the naturally occurring isotopes. Alternatively, the file corrected at the metabolomics platform.

As more than one type of input format -of the measurements- is supported, these are specified in the section 3 Supported input formats.

The metadata file

Description and example

The metadata describing the samples is a tab delimited .csv file provided by the user, which has to contain 6 columns named name_to_plot, timepoint, timenum, condition, compartment, original_name.

Here is the semantics of the columns:

  • name_to_plot is the string that will appear on the figures produced by DIMet
  • condition is the experimental condition
  • timepoint is the sampling time as it is defined in your experimental setup (it is an arbitary string that can contain non numerical characters)
  • timenum is the numerical encoding of the timepoint
  • compartment is the name of the cellular compartment for which the measuring has been done (e.g. "endo", "endocellular", "cyto", etc)
  • original_name contains the column names that are provided in the quantification files

Example:

name_to_plot condition timepoint timenum compartment original_name
Cond1 T0 cond1 T0 0 comp_name T0_cond_1
Cond1 T24 cond1 T24 24 comp_name T24_cond_1
Cond2 T0 cond2 T0 0 comp_name T0_cond_2
Cond3 T24 cond2 T24 24 comp_name T24_cond_2

The column name_to_plot is not used by tracegroomer but it will be used by DIMet, so it is practical to set it from the start.

 _Note_: You can create this file with any spreadsheet program such as Excel or Google Sheets or LibreOfice Calc. At the moment of saving your file you specify that the delimiter must be a `tab` ("Tab delimiter" or similar option depending of your context), see https://support.microsoft.com/en-us/office/save-a-workbook-to-text-format-txt-or-csv-3e9a9d6c-70da-4255-aa28-fcacf1f081e6. 

Special case of the "rule" type of input

In addition to the samples metadata explained above, a tab delimited .tsv 'variableMetadata' is required if the user provides a "rule" type of input. It must contain three compulsory columns for correct isotopologue interpretation, and these column names must be described in the .yml file (it is described in the section "the configuration file"):
  • ID : isotopologue unique identifiers as in dataMatrix
  • metabolite_name : metabolite annotation (or the chemical formula), these entries are not unique as a same formula belongs to several isotopologues
  • isotope_numeric : the number of labeled (13C or other) atoms, exclusively as integer type

The configuration file (for the command line version only)

Description and example This file contains basic needed information: the name of the metadata file, the names (but not the paths) of the output files, and the absolute path to the output folder.

The comments (#) serve as guide. The user must fill after the colon of each field:

groom_out_path :  ~/examples_TraceGrommer/data/example-isocor_data   # absolute path to output DIRECTORY
metadata: metadata1   # file name, no extension. Must be in the output DIRECTORY 

# when using the IsoCor's output, just specify the desired output basenames (after the colon): 
abundances : totalAbundances  # total abundance
mean_enrichment : fracContributions  # mean enrichment
isotopologue_proportions : isotopologueProps  # isotopologue proportions
isotopologues : isotopologuesCorrValues  # isotopologue absolute values

Note: There exist online editors for .yml files, such as https://yamlchecker.com/, just copy-paste and edit!

Special case of config file for the "rule" input

If the user provides the "rule" input type, additional parameters are required for matching the 'variableMetadata' with the 'dataMatrix':
 groom_out_path :  ~/examples_TraceGrommer/data/example-isocor_data   
 metadata: metadata1   # file name, no extension. Must be in the output DIRECTORY
 abundances : null   # do not change this
 mean_enrichment : null   # do not change this
 isotopologue_proportions : null  # do not change this
 isotopologues : isotopologueAbsolute  # desired output basename for this metric

 # - additional section below exclusive when using "rule" input -
 variable_description : variableMetadata  # file name, no extension. Must be in the output DIRECTORY
 columns_variable_description:  # do not change this
   identifier: ID   # column name with same isotopologue unique identifiers as in dataMatrix
   compound: metabolite_name  # metabolite annotation column (formula is also accepted)
   isotopologue_number : isotope_numeric  # column with the number of labeled (13C or other) atoms

Facultative files

  • the amount of material by sample (tab delimited csv file)
  • a file with metabolites to exclude (tab delimited csv file)

The facultative files examples are also found in the Zenodo material.