generated from snakemake-workflows/snakemake-workflow-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add gseapy gene set enrichment (dryrun working, not tested othe…
…rwise)
- Loading branch information
1 parent
93958f3
commit 9fadae7
Showing
9 changed files
with
220 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
- nodefaults | ||
dependencies: | ||
- gseapy =1.0.3 | ||
- python =3.7 | ||
- ipykernel | ||
- jupyter =1.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Enrichment of {{snakemake.wildcards.enrichr_library}} for {{snakemake.params.model["formula"]}} for the following parameters: {{snakemake.wildcards.paramspace_instance}}. {% if snakemake.wildcards.diffexp_filter != "nofilter" %} Here, genes have been additionally filtered by the following statement: {{ snakemake.params.diffexp_query }} {% endif %} Enrichment was performed with GSEApy, using a hypergeometric test for the null hypothesis that the genes in the gene set are independent of the significantly differentially expressed genes. The background was adjusted for the set of measured genes in the panel. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
name: ?f"Enrichment of {wildcards.enrichr_library} for {wildcards.contrast}" | ||
|
||
datasets: | ||
enrichment: | ||
path: ?input.enrichment | ||
separator: "\t" | ||
|
||
views: | ||
enrichment: | ||
dataset: enrichment | ||
page-size: 18 | ||
desc: | | ||
Enrichment analysis of all gene sets in {wildcards.enrichr_library}. | ||
* term: gene set name | ||
* overlap: number of genes that have a significantly non-zero fold change (according to GFOLD) vs. all measured genes in the gene set | ||
* p-value: p-value of hypergeometric test for the null hypothesis that the gene set is not enriched | ||
* FDR: false discovery rate calculated with Benjamini-Hochberg method | ||
* genes: genes that have a significantly non-zero fold change (according to GFOLD) | ||
render-table: | ||
columns: | ||
term: | ||
link-to-url: | ||
linkout: | ||
url: "{linkout}" | ||
linkout: | ||
display-mode: hidden | ||
p-value: | ||
plot: | ||
heatmap: | ||
scale: linear | ||
domain: | ||
- 0.0 | ||
- 0.049 | ||
- 0.05 | ||
- 0.051 | ||
- 1.0 | ||
range: | ||
- "#a1d99b" | ||
- "#ecf7eb" | ||
- white | ||
- "#ffeedf" | ||
- "#fdae6b" | ||
FDR: | ||
plot: | ||
heatmap: | ||
scale: linear | ||
domain: | ||
- 0.0 | ||
- 0.049 | ||
- 0.05 | ||
- 0.051 | ||
- 1.0 | ||
range: | ||
- "#a1d99b" | ||
- "#ecf7eb" | ||
- white | ||
- "#ffeedf" | ||
- "#fdae6b" | ||
odds ratio: | ||
plot: | ||
heatmap: | ||
scale: linear | ||
color-scheme: blues | ||
genes: | ||
custom: | | ||
function(value) { | ||
return value.split(";").map(function(gene) { | ||
return `<a href="https://www.genecards.org/cgi-bin/carddisp.pl?gene=${gene}"><span class="badge badge-info">${gene}</span></a>` | ||
}) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
import sys | ||
sys.stderr = open(snakemake.log[0], "w") | ||
|
||
import pandas as pd | ||
import gseapy as gp | ||
|
||
background = pd.read_csv(snakemake.input.unfiltered, sep="\t").loc[:, "gene_symbol"].drop_duplicates() | ||
|
||
gfold_change = pd.read_csv(snakemake.input.filtered, sep="\t").loc[:, "gene_symbol"].drop_duplicates() | ||
|
||
if snakemake.wildcards.enrichr_library == "MSigDB_Hallmark_2020": | ||
def linkout_func(term): | ||
return f"https://www.gsea-msigdb.org/gsea/msigdb/cards/HALLMARK_{term.upper().replace(' ', '_')}" | ||
elif snakemake.wildcards.enrichr_library in ("GO_Biological_Process_2020", "GO_Molecular_Function_2021", "GO_Cellular_Component_2021"): | ||
def linkout_func(term): | ||
goid = term.split(" ")[-1].strip("()") | ||
return f"https://www.ebi.ac.uk/QuickGO/term/{goid}" | ||
else: | ||
def linkout_func(term): | ||
return f"https://www.google.com/search?q={term}" | ||
|
||
|
||
if len(gfold_change) <= 1: | ||
# enrichment does only work with at least 2 genes | ||
pd.DataFrame(columns=["term", "linkout", "p-value", "FDR", "odds ratio", "genes"]).to_csv(snakemake.output[0], sep='\t', index=False) | ||
else: | ||
gene_set = gp.parser.download_library(snakemake.wildcards.enrichr_library, snakemake.params.species) | ||
|
||
enrichment = gp.enrich( | ||
gene_list=gfold_change, | ||
gene_sets=[gene_set], | ||
background=background, | ||
) | ||
|
||
results = enrichment.results.sort_values("P-value") | ||
results.drop(columns="Gene_set", inplace=True) | ||
results.columns = results.columns.str.lower() | ||
results.rename(columns={"adjusted p-value": "FDR"}, inplace=True) | ||
|
||
# simplify floats | ||
for col in ["p-value", "FDR", "odds ratio"]: | ||
results[col] = results[col].map("{:.3g}".format) | ||
|
||
results.insert(1, "linkout", results["term"].map(linkout_func)) | ||
|
||
results.to_csv(snakemake.output[0], sep='\t', index=False) |