Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

89 gsora and gsea. #92

Merged
merged 32 commits into from
Apr 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
2616875
initial getPatternGeneSet
jmitchell81 Apr 3, 2024
1485a94
initial GSEA
jmitchell81 Apr 3, 2024
7b76947
params for internal patternMarkers call
jmitchell81 Apr 4, 2024
ad8b720
"padj" replacing "q-value"
jmitchell81 Apr 5, 2024
8d53ccc
"PatternHallmark" to "PatternGeneSet"
jmitchell81 Apr 5, 2024
abe2da8
fct_order gene sets by significance
jmitchell81 Apr 5, 2024
3bd8379
pattern amplitude as enrichment stat
jmitchell81 Apr 5, 2024
bac3fc6
gene set names from list argument names
jmitchell81 Apr 5, 2024
ad2c512
specify function calls from dplyr, fgsea, forcats
jmitchell81 Apr 5, 2024
2459bb2
unit tests for getPatternGenSet
jmitchell81 Apr 5, 2024
b177189
specify padj threshold for geneset plot
jmitchell81 Apr 5, 2024
604b92f
trim comments
jmitchell81 Apr 5, 2024
0c80e32
use stored GIST result to test getPatternGeneSet
jmitchell81 Apr 5, 2024
f4d6237
getPatternGeneSet Generic
jmitchell81 Apr 5, 2024
b42ec49
remove getPatternHallmarks definition
jmitchell81 Apr 5, 2024
b4f17d3
testthat::expect_error for getPatternGeneSet
jmitchell81 Apr 6, 2024
770f3ef
update get and plot GeneSet generics
jmitchell81 Apr 6, 2024
0db9f33
devtools document update
jmitchell81 Apr 6, 2024
ccf59fa
switch to expect_no_error
jmitchell81 Apr 6, 2024
6aa129f
static hline threshold
jmitchell81 Apr 6, 2024
3bfe7d1
title specific to test method used
jmitchell81 Apr 6, 2024
a83199a
tests for plotPatternGeneSet
jmitchell81 Apr 6, 2024
6db63ed
remove object argument from plotPatternGeneSet
jmitchell81 Apr 6, 2024
2f6db07
remove object arguments from tests
jmitchell81 Apr 6, 2024
4ff7bf7
remove ggplot2::aes_string import
jmitchell81 Apr 6, 2024
6956f1e
replace getPatternGeneSet example in vignette
jmitchell81 Apr 8, 2024
5ca56ea
patternMarker parameters in getPatternGeneSet
jmitchell81 Apr 8, 2024
9a7ff3b
Remove documentation of object for plotting
jmitchell81 Apr 8, 2024
d69ee96
remove import biomart and msigdbr from DESC
jmitchell81 Apr 8, 2024
faa2ce5
load modsim data without filepath
jmitchell81 Apr 8, 2024
ceee671
remove second occurance of load()
jmitchell81 Apr 8, 2024
66e3955
Update DESCRIPTION
dimalvovs Apr 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: CoGAPS
Version: 3.23.2
Version: 3.24.0
Date: 2024-03-22
Title: Coordinated Gene Activity in Pattern Sets
Author: Jeanette Johnson, Ashley Tsang, Jacob Mitchell, Thomas Sherman, Wai-shing Lee, Conor Kelton, Ondrej Maxian, Jacob Carey,
Expand Down Expand Up @@ -33,12 +33,10 @@ Imports:
tools,
utils,
rhdf5,
biomaRt,
dplyr,
fgsea,
forcats,
ggplot2,
msigdbr
ggplot2
Suggests:
testthat,
knitr,
Expand Down
8 changes: 2 additions & 6 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@ export(getFeatureLoadings)
export(getMeanChiSq)
export(getOriginalParameters)
export(getParam)
export(getPatternHallmarks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming needs to be done in also in /vignettes/CoGAPs.rmd to allow the package to build.
stopping review here as the R CMD build . fails

export(getPatternGeneSet)
export(getPatternMatrix)
export(getRetinaSubset)
export(getSampleFactors)
export(getSubsets)
export(getUnmatchedPatterns)
export(getVersion)
export(patternMarkers)
export(plotPatternHallmarks)
export(plotPatternGeneSet)
export(plotPatternMarkers)
export(plotResiduals)
export(reconstructGene)
Expand All @@ -42,7 +42,6 @@ exportClasses(CogapsParams)
exportClasses(CogapsResult)
import(dplyr)
import(fgsea)
import(msigdbr)
importClassesFrom(S4Vectors,Annotated)
importClassesFrom(S4Vectors,character_OR_NULL)
importClassesFrom(SingleCellExperiment,LinearEmbeddingMatrix)
Expand All @@ -52,13 +51,10 @@ importFrom(RColorBrewer,brewer.pal)
importFrom(Rcpp,evalCpp)
importFrom(S4Vectors,make_zero_col_DFrame)
importFrom(SummarizedExperiment,assay)
importFrom(biomaRt,getBM)
importFrom(biomaRt,useEnsembl)
importFrom(cluster,agnes)
importFrom(dplyr,relocate)
importFrom(forcats,fct_reorder)
importFrom(ggplot2,aes)
importFrom(ggplot2,aes_string)
importFrom(ggplot2,coord_flip)
importFrom(ggplot2,geom_col)
importFrom(ggplot2,geom_hline)
Expand Down
31 changes: 21 additions & 10 deletions R/class-CogapsResult.R
Original file line number Diff line number Diff line change
Expand Up @@ -154,25 +154,36 @@ setGeneric("getPatternMatrix", function(object)
setGeneric("getMeanChiSq", function(object)
{standardGeneric("getMeanChiSq")})

#' generate statistics associating patterns with MSigDB hallmark gene sets
#' generate statistics associating patterns with gene sets
#' @export
#' @docType methods
#' @rdname getPatternHallmarks-methods
#' @aliases getPatternHallmarks
#' @rdname getPatternGeneSet-methods
#' @aliases getPatternGeneSet
#' @param object an object of type CogapsResult
#' @return dataframe of hallmark info
setGeneric("getPatternHallmarks", function(object) standardGeneric("getPatternHallmarks"))
#' @param gene.sets a list of gene sets to test. List names should be the names of the gene sets
#' @param method enrichment or overrepresentation. Conducts a test for gene set enrichment using {fgsea::gsea} ranking features by pattern amplitude or a test for gene set overrepresentation in pattern markers using {fgsea::fora}, respectively.
#' @param ... additional parameters passed to {patternMarkers} if using overrepresentation method
#' @return list of dataframes containing gene set enrichment or gene set overrepresentation statistics
#' @examples
#' data(GIST)
#' gs.test <- list(
#' "gs1" = c("Hs.2", "Hs.4", "Hs.36", "Hs.96", "Hs.202"),
#' "gs2" = c("Hs.699463", "Hs.699288", "Hs.699280", "Hs.699154", "Hs.697294")
#' )
#' getPatternGeneSet(object = GIST.result, gene.sets = gs.test, method = "enrichment")
#' getPatternGeneSet(object = GIST.result, gene.sets = gs.test, method = "overrepresentation")
setGeneric("getPatternGeneSet", function(object, gene.sets, method = c("enrichment", "overrepresentation"), ...) {standardGeneric("getPatternGeneSet")})

#' generate a barchart of most significant hallmark sets for a pattern
#' @export
#' @docType methods
#' @rdname plotPatternHallmarks-methods
#' @aliases plotPatternHallmarks
#' @param object an object of type CogapsResult
#' @param patternhallmarks output from getPatternHallmarks
#' @rdname plotPatternGeneSet-methods
#' @aliases plotPatternGeneSet
#' @param patterngeneset output from getPatternGeneSet
#' @param whichpattern which pattern to generate bar chart for
#' @param padj_threshold maximum adjusted p-value of gene sets rendered on the resulting plot
#' @return image object of barchart
setGeneric("plotPatternHallmarks", function(object, patternhallmarks, whichpattern=1) standardGeneric("plotPatternHallmarks"))
setGeneric("plotPatternGeneSet", function(patterngeneset, whichpattern=1, padj_threshold = 0.05) standardGeneric("plotPatternGeneSet"))

#' return version number used to generate this result
#' @export
Expand Down
142 changes: 70 additions & 72 deletions R/methods-CogapsResult.R
Original file line number Diff line number Diff line change
Expand Up @@ -292,101 +292,99 @@ unitVector <- function(n, length)
return(vec)
}

#' @rdname getPatternHallmarks-methods
#' @import msigdbr
#' @rdname getPatternGeneSet-methods
#' @import dplyr
#' @importFrom biomaRt useEnsembl getBM
#' @import fgsea
#' @importFrom forcats fct_reorder
#' @aliases getPatternHallmarks
setMethod("getPatternHallmarks", signature(object="CogapsResult"),
function(object)
#' @aliases getPatternGeneSet
setMethod("getPatternGeneSet", signature(object="CogapsResult", gene.sets="list", method = "character"),
function(object, gene.sets, method = c("enrichment", "overrepresentation"), ...)
{
# List of MsigDB hallmarks and the genes in each set
hallmark_df <- msigdbr(species = "Homo sapiens", category = "H")
hallmark_list <- hallmark_df %>% split(x = .$gene_symbol, f = .$gs_name)

# obtain universe of all human hgnc gene symbols from ensembl
mart <- useEnsembl('genes', dataset = "hsapiens_gene_ensembl") #GRCh38
Hs_genes <- getBM("hgnc_symbol", mart = mart)

patternMarkerResults <- patternMarkers(object, threshold = "cut", lp = NA, axis = 1)
names(patternMarkerResults$PatternMarkers) <- colnames(patternMarkerResults$PatternMarkerRanks)

# List of each gene as a pattern marker
PMlist <- patternMarkerResults$PatternMarkers

d <- list()
for (pattern in names(PMlist))
{
# Over-representation analysis of pattern markers ###########################
suppressWarnings(
result <- fora(pathways = hallmark_list,
genes = PMlist[[pattern]],
universe = Hs_genes$hgnc_symbol,
maxSize=2038)
method <- match.arg(method)
A <- object@featureLoadings
patternNames <- colnames(A)
features <- rownames(A)
# enrichment method
if(method == "enrichment") {
d <- lapply(
patternNames, FUN = function(p) {
amp <- A[,p]
names(amp) <- features
result <- fgsea::fgsea(pathways = gene.sets, stats = amp, scoreType = "pos")
result$leadingEdge <- vapply(result$leadingEdge, FUN = toString, FUN.VALUE = character(1))
result$neg.log.padj <- (-10) * log10(result$padj)
result$gene.set <- names(gene.sets)
result <- dplyr::mutate(result,gene.set=forcats::fct_reorder(gene.set, - padj))
return(result)
}
)
}

# overrepresentation method
if(method == "overrepresentation") {
patternMarkerResults <- patternMarkers(object, ...)
names(patternMarkerResults$PatternMarkers) <- patternNames

# Append ratio of overlap/# of genes in hallmark (k/K)
result$"k/K" <- result$overlap/result$size

# Append log-transformed HB-adjusted q value (-10 * log(adjusted p value))
result$"neg.log.q" <- (-10) * log10(result$padj)

# Append shortened version of hallmark's name without "HALLMARK_"
result$MsigDB_Hallmark <- substr(result$pathway, 10, nchar(result$pathway))

# Reorder dataframe by ascending adjusted p value
result <- mutate(result, MsigDB_Hallmark=fct_reorder(MsigDB_Hallmark, - padj))

d[[length(d)+1]] <- result

PMlist <- patternMarkerResults$PatternMarkers
d <- lapply(
patternNames, FUN = function(p) {
result <- fgsea::fora(
pathways = gene.sets, genes = PMlist[[p]],
universe = features,
maxSize=2038)
result[["k/K"]] <- result$overlap / result$size
result$neg.log.padj <- (-10) * log10(result$padj)
result$gene.set <- names(gene.sets)
result <- dplyr::mutate(result,gene.set=forcats::fct_reorder(gene.set, - padj))
return(result)
}
)
}
return(d)
return(d)
})

#' @rdname plotPatternHallmarks-methods
#' @rdname plotPatternGeneSet-methods
#' @importFrom dplyr relocate
#' @importFrom ggplot2 ggplot aes_string geom_col ylab coord_flip theme_minimal ggtitle geom_hline geom_text theme aes ylim
#' @importFrom ggplot2 ggplot geom_col ylab coord_flip theme_minimal ggtitle geom_hline geom_text theme aes ylim
#' @importFrom graphics plot legend lines points
#' @aliases plotPatternHallmarks
setMethod("plotPatternHallmarks", signature(object="CogapsResult", patternhallmarks = "list", whichpattern="numeric"),
function(object, patternhallmarks, whichpattern=1)
#' @aliases plotPatternGeneSet
setMethod("plotPatternGeneSet", signature(patterngeneset = "list", whichpattern="numeric", padj_threshold = "numeric"),
function(patterngeneset, whichpattern=1, padj_threshold = 0.05)
{
# should be able to pass in info about just one pattern, or to pass all info and specify which pattern
if (length(patternhallmarks) > 1){
patternhallmarks <- patternhallmarks[[whichpattern]]
if (length(patterngeneset) > 1){
gs_df <- patterngeneset[[whichpattern]]
} else {
gs_df <- patterngeneset[1]
}
# check the test type conducted to inform the title of the resulting plot
if("k/K" %in% colnames(gs_df)) {
method_name <- "Overrepresented"
} else {
method_name <- "Enriched"
}

# rearrange columns to move overlap genes to the end and convert to vector for
# compatibility with saving as csv
patternhallmarks <- relocate(patternhallmarks, "overlapGenes", .after = "MsigDB_Hallmark")
patternhallmarks <- relocate(patternhallmarks, "neg.log.q", .after = "padj")
patternhallmarks <- patternhallmarks %>% mutate(overlapGenes = sapply(overlapGenes, toString))

# for plotting, limit the results to significant over-representation
patternhallmarks <- patternhallmarks[patternhallmarks$pval < 0.05,]
gs_df <- gs_df[gs_df$padj < padj_threshold,]
neg.log.hline <- -10*log10(0.05)

axis_max <- ceiling(max(gs_df$neg.log.padj)) + (max(gs_df$neg.log.padj)/4)

#plot and save the waterfall plot of ORA p-values
plot <- ggplot(patternhallmarks, aes_string(y = "neg.log.q", x = "MsigDB_Hallmark", fill = "MsigDB_Hallmark")) +
## Specifies barplot
plot <- ggplot(gs_df, aes(y = neg.log.padj, x = gene.set, fill = gene.set)) +
geom_col() +
## Rename y axis
ylab("-10*log10(FDR q-value)") +
## Flips the coordinates
ylab("-10*log10(FDR-adjusted p-value)") +
coord_flip() +
## Makes the background white
theme_minimal() +
## Add title
ggtitle(paste0("Overrepresented MsigDB Hallmarks in Pattern", whichpattern)) +
## This creates the dotted line at .05 value
geom_hline(yintercept=c(13.0103), linetype="dotted") + # Add veritcle line to show significances
## Adds the q values
ggtitle(paste0(method_name, " gene sets in Pattern_", whichpattern)) +
geom_text(aes(label=format(signif(padj, 4))), hjust = -.04) +
## Removes legend
theme(legend.position = "none") +
## specifies limits
ylim(0, ceiling(max(patternhallmarks$"neg.log.q")) + (max(patternhallmarks$"neg.log.q")/4))
ylim(0, axis_max)

if(neg.log.hline <= axis_max) {
plot <- plot +
geom_hline(yintercept=neg.log.hline, linetype="dotted")
}

return(plot)
})
Expand Down
46 changes: 46 additions & 0 deletions man/getPatternGeneSet-methods.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 0 additions & 21 deletions man/getPatternHallmarks-methods.Rd

This file was deleted.

25 changes: 25 additions & 0 deletions man/plotPatternGeneSet-methods.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading