Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to sweep clustering parameters #765

Merged
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
9028124
add function to sweep parameters
sjspielman Sep 17, 2024
7e10ed6
run document
sjspielman Sep 17, 2024
952acfb
add tests for sweep function
sjspielman Sep 18, 2024
93ca2af
update description
sjspielman Sep 18, 2024
8adb36f
add argument for threads
sjspielman Sep 18, 2024
523550d
document threads
sjspielman Sep 18, 2024
3b03fe2
actually use threads
sjspielman Sep 18, 2024
e52227a
fix docs typo
sjspielman Sep 18, 2024
589fd1f
Apply suggestions from code review
sjspielman Sep 18, 2024
3ef6db5
renames
sjspielman Sep 18, 2024
4122ff4
Update function for multiple algorithms and redocument
sjspielman Sep 18, 2024
a51ff78
can't use NA since match.args isn't here for it. use calculate_cluste…
sjspielman Sep 18, 2024
dd4fd31
update existing tests
sjspielman Sep 18, 2024
0e47c81
Fixed a bug: don't include cluster_args if it's empty
sjspielman Sep 18, 2024
0d3773d
more comment for future us
sjspielman Sep 18, 2024
fa0c887
more tests
sjspielman Sep 18, 2024
604e1ea
no more cluster_args in sweep function
sjspielman Sep 18, 2024
0baaa75
one more associated docs update
sjspielman Sep 18, 2024
e93c779
check NA for objective_function before match.arg'ing
sjspielman Sep 18, 2024
9cc7d96
back to NA values, char for objective_function and real for resolution
sjspielman Sep 18, 2024
0f5b286
back to defaults, not NA, for additional parameters
sjspielman Sep 19, 2024
3ad8c09
Apply suggestions from code review
sjspielman Sep 19, 2024
3a044ce
better alg checking
sjspielman Sep 19, 2024
7b3b575
tests styling/spacing
sjspielman Sep 19, 2024
b613668
use map instead
sjspielman Sep 19, 2024
ee2af5b
update comment'
sjspielman Sep 19, 2024
7b86bdf
one more spot to remove unique
sjspielman Sep 19, 2024
99fd963
remove redundant tests
sjspielman Sep 19, 2024
b53f16d
Apply suggestions from code review
sjspielman Sep 19, 2024
021a365
test sweep with seurat and matrix
sjspielman Sep 19, 2024
ff083f7
remove a .gitkeep straggler
sjspielman Sep 19, 2024
01628d2
Apply suggestions from code review
sjspielman Sep 19, 2024
64d82b2
simplify
sjspielman Sep 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion packages/rOpenScPCA/NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(calculate_clusters)
export(calculate_clusters_sweep)
export(extract_pc_matrix)
export(sweep_clusters)
import(SingleCellExperiment)
import(methods)
14 changes: 9 additions & 5 deletions packages/rOpenScPCA/R/calculate-clusters.R
Original file line number Diff line number Diff line change
Expand Up @@ -129,18 +129,22 @@ calculate_clusters <- function(
)
)


# Transform results into a table and return
cluster_df <- data.frame(
cell_id = rownames(pca_matrix),
cluster = clusters,
algorithm = algorithm,
weighting = weighting,
nn = nn
) |>
dplyr::bind_cols(
data.frame(cluster_args)
)
)

# Add in cluster_args if it has parameters to include
if (length(cluster_args) != 0) {
cluster_df <- cluster_df |>
dplyr::bind_cols(
data.frame(cluster_args)
)
}

return(cluster_df)
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,63 +1,82 @@
#' Calculate clusters across a set of parameters
#'
#' This function can be used to perform reproducible clustering while varying a set of parameters.
#' A single clustering algorithm is required, but multiple values can be provided for any of:
#' Multiple values can be provided for any of:
#' - The algorithm (`algorithm`)
#' - The weighting scheme (`weighting`)
#' - Number of nearest neighrbors (`nn`)
#' - The resolution parameter (`resolution`)
#' - The objective function parameter (`objective_function`)
#'
#' For each algorithm specified, all parameters possible to use with that
#' algorithm will be systematically varied.
#' Note that defaults for some arguments may differ from the bluster::NNGraphParam() defaults.
#' Specifically, the clustering algorithm defaults to "louvain" and the weighting scheme to "jaccard"
#' to align with common practice in scRNA-seq analysis.
#'
#' @param x An object containing PCs that clustering can be performed in. This can be either a SingleCellExperiment
#' object, a Seurat object, or a matrix where columns are PCs and rows are cells. If a matrix is provided, it must
#' have row names of cell ids (e.g., barcodes).
#' @param algorithm Clustering algorithm to use. Must be one of "louvain" (default), "walktrap", or "leiden".
#' @param x An object containing PCs that clustering can be performed in. This can be either
#' a SingleCellExperiment object, a Seurat object, or a matrix where columns are PCs and
#' rows are cells. If a matrix is provided, it must have row names of cell ids (e.g., barcodes).
#' @param algorithm Clustering algorithm to use. Must be one of "louvain" (default), "walktrap",
#' or "leiden".
#' @param weighting Weighting scheme(s) to consider when sweeping parameters.
#' Provide a vector of unique values to vary this parameter. Options include "jaccard" (default), "rank", or "number"
#' Provide a vector of unique values to vary this parameter. Options include "jaccard" (default),
#' "rank", or "number"
#' @param nn Number of nearest neighbors to consider when sweeping parameters.
#' Provide a vector of unique values to vary this parameter. Default is 10.
#' @param resolution Resolution parameter used by louvain and leiden clustering only.
#' Provide a vector of unique values to vary this parameter. Default is 1.
#' @param objective_function Leiden-specific parameter for whether to use the Constant Potts Model ("CPM"; default) or "modularity".
#' Provide a vector of unique values to vary this parameter.
#' @param objective_function Leiden-specific parameter for whether to use the
#' Constant Potts Model ("CPM"; default) or "modularity". Provide a vector of unique values
#' to vary this parameter.
#' @param cluster_args List of additional arguments to pass to the chosen clustering function.
#' Parameters values in this list cannot be varied.
#' Only single values for each argument are supported (no vectors or lists).
#' See igraph documentation for details on each clustering function: https://igraph.org/r/html/latest
#' @param seed Random seed to set for clustering.
#' @param threads Number of threads to use. Default is 1.
#' @param pc_name Name of principal components slot in provided object. This argument is only used if a SingleCellExperiment
#' or Seurat object is provided. If not provided, the SingleCellExperiment object name will default to "PCA" and the
#' Seurat object name will default to "pca".
#' @param pc_name Name of principal components slot in provided object. This argument is only used
#' if a SingleCellExperiment or Seurat object is provided. If not provided, the SingleCellExperiment
#' object name will default to "PCA" and the Seurat object name will default to "pca".
#'
#' @return A data frame with results from performing clustering with all parameter combinations.
#' @return A list of data frames from performing clustering across all parameter combinations.
#' Columns include `cluster_set` (identifier column for results from a single clustering run),
#' `cell_id`, and `cluster`. Additional columns represent algorithm parameters and include at least:
#' `algorithm`, `weighting`, and `nn`. Louvain and leiden clustering will also include `resolution`,
#' and leiden clustering will further include `objective_function`.
#' and leiden clustering will further include `objective_function`. Any additional specified parameters
#' for the given algorithm will also be included.
#'
#' @export
#'
#' @examples
#' \dontrun{
#' # performing louvain clustering with jaccard weighting,
#' # perform louvain clustering with jaccard weighting (defaults),
#' # varying the nearest neighobor parameter.
#' cluster_df <- calculate_clusters_sweep(sce_object, nn = c(10, 15, 20, 25))
#' cluster_df <- sweep_clusters(sce_object, nn = c(10, 15, 20, 25))
#'
#' # performing louvain clustering, with jaccard and rank weighting, and
#' # perform louvain clustering, with jaccard and rank weighting, and
#' # varying the nearest neighbor and resolution parameters.
#' cluster_df <- calculate_clusters_sweep(
#' cluster_df <- sweep_clusters(
#' sce_object,
#' algorithm = "louvain",
#' weighting = c("jaccard", "rank"),
#' nn = c(10, 15, 20, 25),
#' resolution = c(0.5, 1)
#' )
#'
#' # perform walktrap and louvain clustering with jaccard weighting, and
#' # varying the nearest neighbors for both algorithms, and resolution for louvain.
#' cluster_df <- sweep_clusters(
#' sce_object,
#' algorithm = c("walktrap", "louvain"),
#' weighting = "jaccard",
#' nn = c(10, 15, 20, 25),
#' resolution = c(0.5, 1)
#' )
#' }
sweep_clusters <- function(
x,
algorithm = c("louvain", "walktrap", "leiden"),
algorithm = "louvain",
weighting = "jaccard",
nn = 10,
resolution = 1, # louvain or leiden
Expand All @@ -77,21 +96,25 @@ sweep_clusters <- function(
stop("The first argument should be one of: a SingleCellExperiment object, a Seurat object, or a matrix with row names.")
}

algorithm <- match.arg(algorithm)

# Collect all specific inputs into a single list
# Even parameters that won't be used can be included
# since calculate_clusters() will ignore them anyways
sweep_params <- tidyr::expand_grid(
algorithm = unique(algorithm),
weighting = unique(weighting),
nn = unique(nn),
resolution = unique(resolution),
objective_function = unique(objective_function)
)
) |>
# set unused parameters for the given algorithm to their defaults, since we can't provide NA
# to match.arg in calculate_clusters
dplyr::mutate(
resolution = ifelse(algorithm %in% c("louvain", "leiden"), resolution, 1),
objective_function = ifelse(algorithm == "leiden", objective_function, "CPM")
sjspielman marked this conversation as resolved.
Show resolved Hide resolved
jashapiro marked this conversation as resolved.
Show resolved Hide resolved
) |>
dplyr::distinct()

sweep_results <- sweep_params |>
purrr::pmap(
\(weighting, nn, resolution, objective_function) {
\(algorithm, weighting, nn, resolution, objective_function) {
calculate_clusters(
x,
algorithm = algorithm,
Expand All @@ -104,8 +127,7 @@ sweep_clusters <- function(
seed = seed
)
}
) |>
dplyr::bind_rows(.id = "cluster_set")
)

return(sweep_results)
}

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

81 changes: 0 additions & 81 deletions packages/rOpenScPCA/tests/testthat/test-calculate-clusters-sweep.R

This file was deleted.

Loading