GSEAPY

GSEAPY: Gene Set Enrichment Analysis in Python.

https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square

https://travis-ci.org/zqfang/GSEApy.svg?branch=master

The main documentation for GSEApy can be found at http://gseapy.rtfd.io/

An example to use gseapy, please click here: Example

Release notes : https://github.com/zqfang/gseapy/releases

GSEAPY is a python wrapper for GSEA and Enrichr.

GSEAPY could be used for RNA-seq, ChIP-seq, Microarry data. It's used for convenient GO enrichment and produce publishable quality figures in python.

GSEAPY has five sub-commands available: gsea, prerank, ssgsea, replot enrichr.

gsea:	The `gsea` module produce GSEA results.The input requries a txt file(FPKM, Expected Counts, TPM, et.al), a cls file, and gene_sets file in gmt format.
prerank:	The `prerank` module produce Prerank tool results. The input expects a pre-ranked gene list dataset with correlation values, which in .rnk format, and gene_sets file in gmt format. `prerank` module is an API to GSEA pre-rank tools.
ssgsea:	The `ssgsea` module perform single sample GSEA(ssGSEA) analysis. The input expects a pd.Series (indexed by gene name), or pd.DataFrame (include `GCT` file) with expression values and `GMT` file. For multi sample input, ssGSEA reconigze gct format, too. ssGSEA enrichment score for the gene set as described by D. Barbie et al 2009.
replot:	The `replot` module reproduce GSEA desktop version results. The only input for GSEApy is the location to `GSEA` Desktop output results.
enrichr:	The `enrichr` module enable you perform gene set enrichment analysis using `Enrichr` API. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr . It runs very fast.

Please use 'gseapy COMMAND -h' to see the detail description for each option of each module.

The full GSEA is far too extensive to describe here; see GSEA documentation for more information. All files' formats for GSEApy are identical to GSEA desktop version.

If you use gseapy in your research, you should cite the original ``GSEA`` and ``Enrichr`` paper.

Why GSEAPY

I would like to use Pandas to explore my data, but I did not find a convenient tool to do gene set enrichment analysis in python. So, here is my reason:

Running inside python interactive console without switch to R!!!
User friendly for both wet and dry lab usrers.
Produce or reproduce publishable figures.
Perform batch jobs easy.
Easy to use in bash shell or your data analysis workflow, e.g. snakemake.

GSEA Java version output:

This is an example of GSEA desktop application output

GSEAPY `Prerank` module output

Using the same data from GSEA, GSEAPY reproduce the example above.

Using Prerank or replot module will reproduce the same figure for GSEA Java desktop outputs

Generated by GSEAPY

GSEAPY figures are supported by all matplotlib figure formats.

You can modify GSEA plots easily in .pdf files. Please Enjoy.

GSEAPY `enrichr` module

note: For now, enrichr module download enriched results only.

TODO: Save enriched table, grids, networks, bar graphs from website server using phantomJS and selenium.

A graphical introduction of Enrichr

Note: Enrichr uses a list of Entrez gene symbols as input. You should convert all gene names to uppercase.

Installation

Install gseapy package from bioconda or pypi.

# if you have conda
$ conda install -c bioconda gseapy

# for windows users
$ conda install -c bioninja gseapy

# or use pip to install the latest release
$ pip install gseapy

You may instead want to use the development version from Github, by running

$ pip install git+git://github.com/zqfang/gseapy.git#egg=gseapy

Dependency

Python 2.7 or 3.4+

Mandatory

Numpy >= 1.13.0
Pandas
Matplotlib
Beautifulsoup4
Requests(for enrichr API)

You may also need to install lxml, html5lib, if you could not parse xml files.

Run GSEAPY

Before you start:

Unless you know exactly how GSEA works, you should convert all gene symobl names to uppercase first.

For command line usage:

# An example to reproduce figures using replot module.
$ gseapy replot -i ./Gsea.reports -o test


# An example to run GSEA using gseapy gsea module
$ gseapy gsea -d exptable.txt -c test.cls -g gene_sets.gmt -o test

# An example to run Prerank using gseapy prerank module
$ gseapy prerank -r gsea_data.rnk -g gene_sets.gmt -o test

# An example to run ssGSEA using gseapy ssgsea module
$ gseapy ssgsea -d expression.txt -g gene_sets.gmt -o test

# An example to use enrichr api
# see details of -g below, -d  is optional
$ gseapy enrichr -i gene_list.txt -g KEGG_2016 -d pathway_enrichment -o test

Run gseapy inside python console:

Prepare expression.txt, gene_sets.gmt and test.cls required by GSEA, you could do this

import gseapy

# run GSEA.
gseapy.gsea(data='expression.txt', gene_sets='gene_sets.gmt', cls='test.cls', outdir='test')

# run prerank
gseapy.prerank(rnk='gsea_data.rnk', gene_sets='gene_sets.gmt', outdir='test')

# run ssGSEA
gseapy.ssgsea(data="expression.txt", gene_sets= "gene_sets.gmt", outdir='test')


# An example to reproduce figures using replot module.
gseapy.replot(indir='./Gsea.reports', outdir='test')

If you prefer to use Dataframe, dict, list in interactive python console, you could do this.

see detail here: Example

# assign dataframe, and use enrichr library data set 'KEGG_2016'
expression_dataframe = pd.DataFrame()

sample_name = ['A','A','A','B','B','B'] # always only two group,any names you like

# assign gene_sets parameter with enrichr library name or gmt file on your local computer.
gseapy.gsea(data=expression_dataframe, gene_sets='KEGG_2016', cls= sample_names, outdir='test')

# using prerank tool
gene_ranked_dataframe = pd.DataFrame()
gseapy.prerank(rnk=gene_ranked_dataframe, gene_sets='KEGG_2016', outdir='test')

# using ssGSEA
gseapy.ssgsea(data=ssGSEA_dataframe, gene_sets='KEGG_2016', outdir='test')

For enrichr , you could assign a list, pd.Series, pd.DataFrame object, or a txt file (should be one gene name per row.)

# assign a list object to enrichr
gl = ['SCARA3', 'LOC100044683', 'CMBL', 'CLIC6', 'IL13RA1', 'TACSTD2', 'DKKL1', 'CSF1',
     'SYNPO2L', 'TINAGL1', 'PTX3', 'BGN', 'HERC1', 'EFNA1', 'CIB2', 'PMP22', 'TMEM173']

gseapy.enrichr(gene_list=gl, description='pathway', gene_sets='KEGG_2016', outdir='test')

# or a txt file path.
gseapy.enrichr(gene_list='gene_list.txt', description='pathway', gene_sets='KEGG_2016',
               outdir='test', cutoff=0.05, format='png' )

GSEAPY supported gene set libaries :

To see the full list of gseapy supported gene set libraries, please click here: Library

Or use get_library_name function inside python console.

 #see full list of latest enrichr library names, which will pass to -g parameter:
 names = gseapy.get_library_name()

 # show top 20 entries.
 print(names[:20])


['Genome_Browser_PWMs',
'TRANSFAC_and_JASPAR_PWMs',
'ChEA_2013',
'Drug_Perturbations_from_GEO_2014',
'ENCODE_TF_ChIP-seq_2014',
'BioCarta_2013',
'Reactome_2013',
'WikiPathways_2013',
'Disease_Signatures_from_GEO_up_2014',
'KEGG_2016',
'TF-LOF_Expression_from_GEO',
'TargetScan_microRNA',
'PPI_Hub_Proteins',
'GO_Molecular_Function_2015',
'GeneSigDB',
'Chromosome_Location',
'Human_Gene_Atlas',
'Mouse_Gene_Atlas',
'GO_Cellular_Component_2015',
'GO_Biological_Process_2015',
'Human_Phenotype_Ontology',]

Bug Report

If you would like to report any bugs when you running gseapy, don't hesitate to create an issue on github here, or email me: fzq518@gmail.com

To get help of GSEAPY

Visit the document site at http://gseapy.rtfd.io/

Name		Name	Last commit message	Last commit date
Latest commit History 634 Commits
.idea		.idea
docs		docs
gseapy		gseapy
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
condatest.sh		condatest.sh
requirements.txt		requirements.txt
setup.py		setup.py
test-requirements.txt		test-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GSEAPY

GSEAPY: Gene Set Enrichment Analysis in Python.

GSEAPY is a python wrapper for GSEA and Enrichr.

Why GSEAPY

GSEA Java version output:

GSEAPY `Prerank` module output

GSEAPY `enrichr` module

Installation

Dependency

Mandatory

Run GSEAPY

Before you start:

For command line usage:

Run gseapy inside python console:

GSEAPY supported gene set libaries :

Bug Report

To get help of GSEAPY

About

Releases

Packages

Languages

License

MathieuBo/GSEApy

Folders and files

Latest commit

History

Repository files navigation

GSEAPY

GSEAPY: Gene Set Enrichment Analysis in Python.

GSEAPY is a python wrapper for GSEA and Enrichr.

Why GSEAPY

GSEA Java version output:

GSEAPY Prerank module output

GSEAPY enrichr module

Installation

Dependency

Mandatory

Run GSEAPY

Before you start:

For command line usage:

Run gseapy inside python console:

GSEAPY supported gene set libaries :

Bug Report

To get help of GSEAPY

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

GSEAPY `Prerank` module output

GSEAPY `enrichr` module

Packages