Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssGSEA: Accept a Series with gene names as index and return a dataframe #27

Closed
olgabot opened this issue Aug 2, 2017 · 3 comments
Closed

Comments

@olgabot
Copy link

olgabot commented Aug 2, 2017

Hello,
I'm very excited about a Python implementation of ssGSEA! I'd like to convert my gene expression values to pathway expression values by using applyto perform ssGSEA on every row of a pandas.DataFrame expression matrix. Right now, this looks like this:

expression.head().apply(
    lambda x: gp.ssgsea(x.reset_index(), gene_sets=gmt), 
    axis=1)
  1. The reset_index() seems unnecessary for every row
  2. Each run of ssgsea returns None, rather than returning the converted pathway enrichment. I'd rather not have to read a file for every single sample I have (~6,000 of them), so can ssgsea return the Series instead?

Warmest,
Olga

@olgabot
Copy link
Author

olgabot commented Aug 2, 2017

It also seems unnecessary to perform the gene set filtering every single time for every sample, but rather do it once for all samples.

@zqfang
Copy link
Owner

zqfang commented Aug 2, 2017

Hi, @olgabot ,

  1. reset_index() is necessary only if your input is a pd.Series, I could fixed this to support pd.Series with gene_names as index(install the latest PR)
  2. actually, ssgsea() return a ssGSEA object, which has many attributes. sorry for the unclear docs. Please try:
    #for example:
    ss = gp.ssgsea(x, gene_sets=gmt)
     #res2d attr is a dataframe contains all final enrichment results. see fig blow.
     ss.res2d
     #results attr is a ordeddict contains all internal statistical testing values     
     ss.results

capture

  1. You are right, the filtering part could be done once. Even so, the most time-consuming part is calculating the statistics on null distributions. ssGSEA module is just designed for single run now.
    I will try to add a patch to improve filter rules soon.

Thank you very much for your great advice.

@zqfang
Copy link
Owner

zqfang commented Aug 25, 2017

now, gseapy 0.8.4 supports gct formats, series, and dataframe with only 1 column(index as gene symbols):

gss =  gp.ssgsea(expression, gene_sets="KEGG_2016")
# to get all results from a dict
gss.resultsOnSamples

@zqfang zqfang closed this as completed Oct 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants