msea

MetaboAnalyst: Metabolite-Set Enrichment Analysis

Given an input table of abundance values with two groups (e.g. case vs. control), this module of metaboanalyst.ca assess which metabolite sets are differentially perturbed between the two groups. Metabolite sets are biologically curated sets of metabolites which are known to be associated with some particular biological condition or disease. KEGG pathways are also considered to be metabolite sets.

This is largely similar to the pathway enrichment analysis module (docs here), with a few key differences:

Only metabolite sets for homo sapiens are available.
Choice of enrichment methodology appear to be more restricted, but this should not be a problem. This uses quantitative enrichment analysis, which is not terribly well-documented.

How-To

Start - from the homepage, select "Enrichment Analysis" (roughly 1 o'clock on the wheel)
Data upload - You'll be presented with three choices about the kind of input: "list of compound names", "a list of compounds with concentration values", and "a concentration table (quantitative enrichment analysis)". Select the third. For the input options:

Group label: Select "Discrete (Classification)"
ID Type: Compound names
Data format: Samples in rows.
Data File: choose your file, and upload your data. You'll then be presented with the KEGG/HMDB/etc. ID, submit data.

Data Integrity Check - Verify that MetaboAnalyst.ca successfully read the number of samples and metabolites that you uploaded, and that two groups were detected. Click Skip to ignore missing value estimation. This is actually slightly misleading; selecting skip will impute missing values with the smallest non-zero missing value in your uploaded data.
Normalization - Select "normalization by median" and "log transformation". Click Normalize to carry out the normalization, View Result to visualize the results of the normalization process, and Proceed to continue to the next step.
Parameters for enrichment analysis - You'll be presented with a list of several types of metabolite-sets, you can choose any of them (or all). Ensure that the "Only use metabolite sets containing at least" checkbox is checked, and selected "2 compounds" from the dropdown list. Continue to the results by clicking the Submit button.
Results - Near the top of the main results window, there's a Network View, and a Barchart View. The latter is more useful.

Output

There are two outputs: a chart of p-values and "Fold Enrichment", and a table of results.

Example Chart Output

This provides a ranked view of the top 50 enriched metabolite sets, ranked by p-value. Fold Enrichment is calculated by dividing Statistic by Expected (see table below) to get a ratio. (See some caveats in Appendix).

Table of Results

Column	Description
Metabolite Set	Name of metabolite set
Total	Total number of metabolites in metabolite set
Hits	Number of input metabolites found in metabolite set
Statistic	test statistic
Expected	Expected number of metabolites to be found in metabolite set
P value	p-value
Holm P	Holm's adjusted p-value. More conservative than FDR.
FDR	False discovery rate corrected p-value. Less conservative than Holm's P.
Details	Creates a pop-up which shows more information about that particular metabolite set.

Appendix

If Fold enrichment = Statistic/Expected, it somewhat worries me is that this implies the test statistic does not a priori already normalize for the size of the metabolite set in calculating the p-value. Unless it does, and the fold enrichment calculation is an erroneous calculation which normalizes for the Expected number of hits (by simple division) again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly