-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add n.expected.cells estimation to emptyDropsCellRanger #88
Comments
Happy to take a PR from whoever's interested in this. The plain-text description in the 10X docs is incomprehensible to me, but maybe someone else can figure out what is happening there and why. |
My read of the algorithm is something like this:
Apply that over a range for How to choose and space the range for coverage across a broad range of possible values without wasting too much time seems to be left as an exercise to the reader. |
Ah. I figured it was trying to check some kind of self-consistency. Anyway, a linear-time algorithm should be fairly straightforward, given that the cutoff can only decrease. ordMagExpected <- function(ordered_totals) {
cutoff_point <- 1
loss <- numeric(length(ordered_totals))
for (i in seq_along(ordered_totals)) {
# Computing the type-7 quantile.
quantile_point <- (i - 1) * 0.01 + 1
left <- floor(quantile_point)
right <- ceiling(quantile_point)
if (left == right) {
quantile_val <- ordered_totals[left]
} else {
leftval <- ordered_totals[left]
rightval <- ordered_totals[right]
leftgap <- quantile_point - left
rightgap <- right - quantile_point
quantile_val <- leftval * rightgap + rightval * leftgap
}
# Finding the cut-off. Currently only accepting cells > cutoff,
# but this could be changed to >= if desired.
cutoff <- quantile_val / 10
while (cutoff_point <= length(ordered_totals) && ordered_totals[cutoff_point] > cutoff) {
cutoff_point <- cutoff_point + 1
}
# Number of cells is one less than the cutoff_point because we're 1-indexed.
num_cells <- cutoff_point - 1
loss[i] <- (i - num_cells)^2
}
which.min(loss)
} Not tested on real data at all. Could be faster in C++, given that we have a |
Hi all, I will work on this during Thanksgiving. Thanks so much for pointing this out and coming up with the solutions! Best, |
As of Cell Ranger 7.0, the initial expected number of cells is estimated by a fairly simple OrdMag algorithm if not specified by the user, as described in the 10X Gene Expression documentation.
It would be nice if a version of this feature were added to
emptyDropsCellRanger()
to preserve comparability with Cell Ranger, as well as to improve default performance with libraries of varying sizes.Tagging @DongzeHE and @rob-p as the primary authors of the existing
emptyDropsCellRanger
function.The text was updated successfully, but these errors were encountered: