Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make summary tables for GO and KEGG analyses #1

Open
qtran1 opened this issue Aug 19, 2015 · 12 comments
Open

Make summary tables for GO and KEGG analyses #1

qtran1 opened this issue Aug 19, 2015 · 12 comments

Comments

@qtran1
Copy link
Owner

qtran1 commented Aug 19, 2015

Produce summary tables for results obtained from GSEA.

@davebridges
Copy link
Collaborator

The GSEA analysis for WAT needs to be redone. Your parameters filtered out all but 2 of the gene sets tested (see index.html):

Gene set size filters (min=5, max=500) resulted in filtering out 184 / 186 gene sets

When you re-run it can you also include the Reactome and CGP gene sets

@qtran1
Copy link
Owner Author

qtran1 commented Sep 10, 2015

I can lower the min to 2, but what does that mean to have a gene set size of 2? I think 5 is small already. I did rerun it with 2 for one of the analysis and still didn't see any significant changes. But I can rerun it again for all and add the Reactome and CGP gene sets. But I doubt that we're going to find anything. I just couldn't find anything interesting with this list of genes.

@davebridges
Copy link
Collaborator

There must be some other parameter difference, because its only testing the
2 gene sets. its not that its not finding significant differences, its up
front discarding most of the gene sets for some reason. The last time i
did this (with mice), my parameters were
https://github.com/BridgesLab/PredictorsDietInducedObesity/blob/RNAseq/data/RNAseq/Juvenile-HFD/scripts/GSEA_inputs.Rmd

On Thu, Sep 10, 2015 at 10:34 AM Quynh Tran notifications@github.com
wrote:

I can lower the min to 2, but what does that mean to have a gene set size
of 2? I think 5 is small already. I did rerun it with 2 for one of the
analysis and still didn't see any significant changes. But I can rerun it
again for all and add the Reactome and CGP gene sets. But I doubt that
we're going to find anything. I just couldn't find anything interesting
with this list of genes.


Reply to this email directly or view it on GitHub
#1 (comment).

@qtran1
Copy link
Owner Author

qtran1 commented Sep 10, 2015

What did you use as the chip platform?

@qtran1
Copy link
Owner Author

qtran1 commented Sep 10, 2015

I meant Chip Platform? The only criteria that are different from yours and mine are the max and min size and maybe the Chip Platform. I used GENE_SYMBOL.chip

You used min 10 to max 5000. I will change mine around this and see if we find anything.

@davebridges
Copy link
Collaborator

I just re-ran the WAT GSEA. The parameters are defined in the README.md in the WAT folder (see f27bf0a). The GSEA data were uploaded in commit b7b4ec9

@qtran1
Copy link
Owner Author

qtran1 commented Sep 10, 2015

Great! So, I ran a little test too on the KEGG pathway. I think when I set "Collapse data set to gene symbols" TRUE, then it only filter out 26/186 gene sets. I thought that setting it to FALSE so I can use the expression data set as is was better.

I'll rerun the rest.

@davebridges
Copy link
Collaborator

its ok i already re-ran them, and pushed it back to github.

On Thu, Sep 10, 2015 at 11:54 AM Quynh Tran notifications@github.com
wrote:

Great! So, I ran a little test too on the KEGG pathway. I think when I set
"Collapse data set to gene symbols" TRUE, then it only filter out 26/186
gene sets. I thought that setting it to FALSE so I can use the expression
data set as is was better.

I'll rerun the rest.


Reply to this email directly or view it on GitHub
#1 (comment).

@qtran1
Copy link
Owner Author

qtran1 commented Sep 10, 2015

Here is the suggestion for running GSEA from their known issues: http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Known_Issues.
It's strongly suggested that for human, when using GSEAPre-ranked, we should NOT set "Collapse dataset to gene symbols = true". Now, for mouse gene symbols, I don't know. And I'm not sure what we do for our mouse gene lists is OK. We don't have multiple instances of the same gene.

Avoid collapsing ranked list of features to gene symbols
Collapsing dataset to symbols means that GSEA takes expression dataset and collapses probes to symbols before computing the ranking metric values. When done this way, GSEA has two ways to deal with multiple occurrences of expression values corresponding to the same gene symbol. By default, it will retain the maximal expression value; alternatively, it will use median expression value. Both choices make reasonable sense when applied to gene expression values. In the Pre-Ranked mode, however, GSEA is faced with the ranks already computed by an unspecified procedure. With the"Collapse dataset to gene symbols"="true", Pre-Ranked GSEA tool will always pick the largest positive value among several instances of ranking metric values for the same gene. This can sometimes produce unanticipated results because the original assumptions for gene expression do not necessarily apply to an arbitrary ranking metric in the pre-ranked list, so that the ordered ranked list might substantially differ from the input values. Therefore, collapsing of ranked list is appropriate if and only if all its features are unique and have one to one correspondence to human gene symbols.

We thus recommend making the ranked list with human gene symbols as gene identifiers and running GSEAPreranked with the parameter "Collapse dataset to gene symbols"="false".

@qtran1
Copy link
Owner Author

qtran1 commented Sep 10, 2015

Therefore, collapsing of ranked list is appropriate if and only if all its features are unique and have one to one correspondence to human gene symbols.

Is it true that there is a one to one correspondence from mouse gene symbols to human gene symbols?

@davebridges
Copy link
Collaborator

That should only apply when there are multiple probes/transcript ids per gene. In our case where there is one gene per list, i dont think it should matter. There wont be more than a 1:1 correspondence between mice and human genes (since they will have different names), but we will lose things that dont have a 1:1 ortholog. I think those will be invisible to the analysis though since they wont be able to match that probe. Im still not sure why collapsing or not should make a difference though, since our data has basically nothing to collapse.

@qtran1
Copy link
Owner Author

qtran1 commented Sep 10, 2015

Yes, that's my question too. We dont have duplicates!

On Sep 10, 2015, at 12:30 PM, Dave Bridges <notifications@github.commailto:notifications@github.com> wrote:

That should only apply when there are multiple probes/transcript ids per gene. In our case where there is one gene per list, i dont think it should matter. There wont be more than a 1:1 correspondence between mice and human genes (since they will have different names), but we will lose things that dont have a 1:1 ortholog. I think those will be invisible to the analysis though since they wont be able to match that probe. Im still not sure why collapsing or not should make a difference though, since our data has basically nothing to collapse.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-139318891.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants