Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Operations on matrices: max, min, range etc. #142

Open
Artur-man opened this issue Oct 21, 2024 · 4 comments
Open

Simple Operations on matrices: max, min, range etc. #142

Artur-man opened this issue Oct 21, 2024 · 4 comments

Comments

@Artur-man
Copy link

Hi,

Would it makes sense to define some easy operations on IterableMatrix, MatrixSubset etc. ? and do you guys believe that there is an easy alternative I can use until these are (if planned) implemented ?

> mat <- matrix(1:20, nrow = 2, ncol = 10)
> max(mat)
[1] 20
> min(mat)
[1] 1
> range(mat)
[1]  1 20
> mat <- as(mat, "CsparseMatrix")
> max(mat)
[1] 20
> min(mat)
[1] 1
> range(mat)
[1]  1 20
@Artur-man
Copy link
Author

Artur-man commented Oct 21, 2024

For now I convert them to in memory sparse matrices quick, where I only need a single column of a large matrix for my project, and it is not that slow:

getMin <- function(data, ...){
  if(inherits(data, "IterableMatrix")){
    data <- as(data, "dgCMatrix")
  } 
  return(min(data, ...))
}

getMax <- function(data, ...){
  if(inherits(data, "IterableMatrix")){
    data <- as(data, "dgCMatrix")
  } 
  return(max(data, ...))
}

@bnprks
Copy link
Owner

bnprks commented Oct 21, 2024

Hi @Artur-man, we have some operations along these lines which implement a subset of the widely-used matrixStats interfaces. We don't currently have a global min or max function, but we do have colMaxs() and rowMaxs() which were added by @immanuelazn in PR 103. A few others are included in this documentation list. We will also have rowQuantiles() and colQuantiles() soon once PR 128 is merged.

A slightly hacky but more efficient way to define your operations would be max <- function(mat) max(colMaxs(mat)) and min <- function(mat) -1*max(colMaxs(-1*mat)). A proper implementation to go in BPCells core would probably define short standalone functions in C++ that are then exposed to R similar to #103.

Conversion to dgCMatrix is not ideal, because converting to it causes crashes for matrices with more than 2^31 non-zero entries, which starts to become an issue around 1M cell matrices for RNA-seq datasets with 2k genes detected per cell. The suggestions I give above won't have those issue as the colMaxs() core implementation uses the efficient internal C++ APIs.

We're happy to take PRs for more of these utility operations, including min() or max() if you're interested. It's a little finicky to maintain compatibility with the matrixStats interface, but PR #103 is a good model of what's required. Happy to discuss more about the details if there's an operation you'd like to contribute. It can be a good area for new contributors to BPCells.

-Ben

@immanuelazn
Copy link
Collaborator

immanuelazn commented Oct 21, 2024

Just to add on to what Ben had mentioned, BPCells also exposes methods to apply R functions to arrays, to each col/row of an IterableMatrix (see apply_by_row/col). While there is a small performance hit of the matrix iteration not all being done at the C++ level, it should still be much faster than switching over to a dgCMatrix.

In the case of running max, you could do the following:

# initializing a dummy IterableMatrix
mat <- matrix(1:20, nrow = 2, ncol = 10) %>% as("dgCMatrix") %>% as("IterableMatrix")
# results are now as a list, one item for every column
res_list <- apply_by_col(mat, min)
# unlist and aggregate
res <- min(unlist(res_list))

@Artur-man
Copy link
Author

ah thanks so much guys trying it now!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants