Make summary tables for GO and KEGG analyses #1

qtran1 · 2015-08-19T19:40:20Z

Produce summary tables for results obtained from GSEA.

davebridges · 2015-09-10T15:26:38Z

The GSEA analysis for WAT needs to be redone. Your parameters filtered out all but 2 of the gene sets tested (see index.html):

Gene set size filters (min=5, max=500) resulted in filtering out 184 / 186 gene sets

When you re-run it can you also include the Reactome and CGP gene sets

qtran1 · 2015-09-10T15:34:10Z

I can lower the min to 2, but what does that mean to have a gene set size of 2? I think 5 is small already. I did rerun it with 2 for one of the analysis and still didn't see any significant changes. But I can rerun it again for all and add the Reactome and CGP gene sets. But I doubt that we're going to find anything. I just couldn't find anything interesting with this list of genes.

davebridges · 2015-09-10T15:40:19Z

There must be some other parameter difference, because its only testing the
2 gene sets. its not that its not finding significant differences, its up
front discarding most of the gene sets for some reason. The last time i
did this (with mice), my parameters were
https://github.com/BridgesLab/PredictorsDietInducedObesity/blob/RNAseq/data/RNAseq/Juvenile-HFD/scripts/GSEA_inputs.Rmd

On Thu, Sep 10, 2015 at 10:34 AM Quynh Tran notifications@github.com
wrote:

I can lower the min to 2, but what does that mean to have a gene set size
of 2? I think 5 is small already. I did rerun it with 2 for one of the
analysis and still didn't see any significant changes. But I can rerun it
again for all and add the Reactome and CGP gene sets. But I doubt that
we're going to find anything. I just couldn't find anything interesting
with this list of genes.

—
Reply to this email directly or view it on GitHub
#1 (comment).

qtran1 · 2015-09-10T15:49:12Z

What did you use as the chip platform?

qtran1 · 2015-09-10T15:56:18Z

I meant Chip Platform? The only criteria that are different from yours and mine are the max and min size and maybe the Chip Platform. I used GENE_SYMBOL.chip

You used min 10 to max 5000. I will change mine around this and see if we find anything.

davebridges · 2015-09-10T16:45:08Z

I just re-ran the WAT GSEA. The parameters are defined in the README.md in the WAT folder (see f27bf0a). The GSEA data were uploaded in commit b7b4ec9

qtran1 · 2015-09-10T16:54:37Z

Great! So, I ran a little test too on the KEGG pathway. I think when I set "Collapse data set to gene symbols" TRUE, then it only filter out 26/186 gene sets. I thought that setting it to FALSE so I can use the expression data set as is was better.

I'll rerun the rest.

davebridges · 2015-09-10T17:04:54Z

its ok i already re-ran them, and pushed it back to github.

On Thu, Sep 10, 2015 at 11:54 AM Quynh Tran notifications@github.com
wrote:

Great! So, I ran a little test too on the KEGG pathway. I think when I set
"Collapse data set to gene symbols" TRUE, then it only filter out 26/186
gene sets. I thought that setting it to FALSE so I can use the expression
data set as is was better.

I'll rerun the rest.

—
Reply to this email directly or view it on GitHub
#1 (comment).

qtran1 · 2015-09-10T17:10:48Z

Here is the suggestion for running GSEA from their known issues: http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Known_Issues.
It's strongly suggested that for human, when using GSEAPre-ranked, we should NOT set "Collapse dataset to gene symbols = true". Now, for mouse gene symbols, I don't know. And I'm not sure what we do for our mouse gene lists is OK. We don't have multiple instances of the same gene.

Avoid collapsing ranked list of features to gene symbols
Collapsing dataset to symbols means that GSEA takes expression dataset and collapses probes to symbols before computing the ranking metric values. When done this way, GSEA has two ways to deal with multiple occurrences of expression values corresponding to the same gene symbol. By default, it will retain the maximal expression value; alternatively, it will use median expression value. Both choices make reasonable sense when applied to gene expression values. In the Pre-Ranked mode, however, GSEA is faced with the ranks already computed by an unspecified procedure. With the"Collapse dataset to gene symbols"="true", Pre-Ranked GSEA tool will always pick the largest positive value among several instances of ranking metric values for the same gene. This can sometimes produce unanticipated results because the original assumptions for gene expression do not necessarily apply to an arbitrary ranking metric in the pre-ranked list, so that the ordered ranked list might substantially differ from the input values. Therefore, collapsing of ranked list is appropriate if and only if all its features are unique and have one to one correspondence to human gene symbols.

We thus recommend making the ranked list with human gene symbols as gene identifiers and running GSEAPreranked with the parameter "Collapse dataset to gene symbols"="false".

qtran1 · 2015-09-10T17:14:38Z

Therefore, collapsing of ranked list is appropriate if and only if all its features are unique and have one to one correspondence to human gene symbols.

Is it true that there is a one to one correspondence from mouse gene symbols to human gene symbols?

davebridges · 2015-09-10T17:30:09Z

That should only apply when there are multiple probes/transcript ids per gene. In our case where there is one gene per list, i dont think it should matter. There wont be more than a 1:1 correspondence between mice and human genes (since they will have different names), but we will lose things that dont have a 1:1 ortholog. I think those will be invisible to the analysis though since they wont be able to match that probe. Im still not sure why collapsing or not should make a difference though, since our data has basically nothing to collapse.

qtran1 · 2015-09-10T17:40:00Z

Yes, that's my question too. We dont have duplicates!

On Sep 10, 2015, at 12:30 PM, Dave Bridges <notifications@github.com mailto:notifications@github.com> wrote:

That should only apply when there are multiple probes/transcript ids per gene. In our case where there is one gene per list, i dont think it should matter. There wont be more than a 1:1 correspondence between mice and human genes (since they will have different names), but we will lose things that dont have a 1:1 ortholog. I think those will be invisible to the analysis though since they wont be able to match that probe. Im still not sure why collapsing or not should make a difference though, since our data has basically nothing to collapse.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-139318891.

davebridges added this to the Submission Ready milestone Sep 10, 2015

davebridges added the enhancement label Sep 10, 2015

davebridges added a commit that referenced this issue Sep 10, 2015

Re-ran GSEA analyses for WAT. Part of #1

b7b4ec9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make summary tables for GO and KEGG analyses #1

Make summary tables for GO and KEGG analyses #1

qtran1 commented Aug 19, 2015

davebridges commented Sep 10, 2015

qtran1 commented Sep 10, 2015

davebridges commented Sep 10, 2015

qtran1 commented Sep 10, 2015

qtran1 commented Sep 10, 2015

davebridges commented Sep 10, 2015

qtran1 commented Sep 10, 2015

davebridges commented Sep 10, 2015

qtran1 commented Sep 10, 2015

qtran1 commented Sep 10, 2015

davebridges commented Sep 10, 2015

qtran1 commented Sep 10, 2015

Make summary tables for GO and KEGG analyses #1

Make summary tables for GO and KEGG analyses #1

Comments

qtran1 commented Aug 19, 2015

davebridges commented Sep 10, 2015

qtran1 commented Sep 10, 2015

davebridges commented Sep 10, 2015

qtran1 commented Sep 10, 2015

qtran1 commented Sep 10, 2015

davebridges commented Sep 10, 2015

qtran1 commented Sep 10, 2015

davebridges commented Sep 10, 2015

qtran1 commented Sep 10, 2015

qtran1 commented Sep 10, 2015

davebridges commented Sep 10, 2015

qtran1 commented Sep 10, 2015