Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k_nn calculation #3

Open
ansonrel opened this issue Aug 30, 2019 · 0 comments
Open

k_nn calculation #3

ansonrel opened this issue Aug 30, 2019 · 0 comments

Comments

@ansonrel
Copy link

ansonrel commented Aug 30, 2019

Hi,

First of all, thank you for you package and for the great manuscript that is linked to it!
I'm trying to run Enhance on different datasets and some of them returned an error:

denois <- enhance (data)
[1] "Calculating number of neighbors to aggregate to aim for 2e+05 transcripts"
[1] "Number of neighbors to aggregate: 1"
[1] "Number of principal components to use: 50"

 Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 
  'x' must be an array of at least two dimensions
 
10.stop("'x' must be an array of at least two dimensions")
 
9.base::rowSums(x, na.rm = na.rm, dims = dims, ...)
 
8.rowSums(D[, indices])
 
7.rowSums(D[, indices]) at
 enhance.R#147
6.FUN(X[[i]], ...)
 
5.lapply(X = X, FUN = FUN, ...)
 
4.sapply(nn, function(indices) {
    rowSums(D[, indices])
})
 
3.sapply(nn, function(indices) {
    rowSums(D[, indices])
}) at
 enhance.R#146
2.aggregate_nearest_neighbors(D = data_raw, nn = nn_1) at
 enhance.R#223

1.enhance(data)
 

Strangely, I only got this error message with my bigger datasets (>1e+06 transcripts per cells). I am surprised that k_nn estimation is 1 and I guess it is the source of the error.

Looking at the code I see that k_nn, the number of neighbors to aggregate, is defined as

  med_raw = median(colSums(data_raw))
  k_nn = ceiling(target_transcripts / med_raw)

Maybe I miss an important point but shouldn't be k_nn equal to ceiling(med_raw / target_transcripts ) instead, so that the number of neighbors to aggregate increases with the number of transcripts per cells ?

In every cases, do you know how I could avoid an error when k_nn = 1 ? Should I increase k_nn to 2 or does it mean that the dataset is too big/small for the method to work ?

[EDIT]
Sorry, I posted this issue in the Python repository instead of the R one. However the calculation of knn is the same and I guess my questions are still relevant here

k = int(ceil(target_transcript_count / transcript_count))

Thanks,
Anthony

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant