Add n.expected.cells estimation to emptyDropsCellRanger #88

jashapiro · 2022-07-13T18:29:57Z

As of Cell Ranger 7.0, the initial expected number of cells is estimated by a fairly simple OrdMag algorithm if not specified by the user, as described in the 10X Gene Expression documentation.

It would be nice if a version of this feature were added to emptyDropsCellRanger() to preserve comparability with Cell Ranger, as well as to improve default performance with libraries of varying sizes.

Tagging @DongzeHE and @rob-p as the primary authors of the existing emptyDropsCellRanger function.

The text was updated successfully, but these errors were encountered:

LTLA · 2022-07-13T21:07:21Z

Happy to take a PR from whoever's interested in this. The plain-text description in the 10X docs is incomprehensible to me, but maybe someone else can figure out what is happening there and why.

jashapiro · 2022-07-13T23:47:25Z

My read of the algorithm is something like this:

ordMagLoss <- function(x, ordered_totals){
  top <- ordered_totals[1:x]
  cutoff <- quantile(top, 0.99)/10
  ordmag <- sum(ordered_totals > cutoff)
  loss <- (ordmag - x)^2/x
}

Apply that over a range for x and pick the value with the minimum loss.

How to choose and space the range for coverage across a broad range of possible values without wasting too much time seems to be left as an exercise to the reader.

LTLA · 2022-07-14T05:42:09Z

Ah. I figured it was trying to check some kind of self-consistency.

Anyway, a linear-time algorithm should be fairly straightforward, given that the cutoff can only decrease.

ordMagExpected <- function(ordered_totals) {
    cutoff_point <- 1
    loss <- numeric(length(ordered_totals))

    for (i in seq_along(ordered_totals)) {
        # Computing the type-7 quantile.
        quantile_point <- (i - 1) * 0.01 + 1
        left <- floor(quantile_point)
        right <- ceiling(quantile_point)

        if (left == right) {
            quantile_val <- ordered_totals[left]
        } else {
            leftval <- ordered_totals[left]
            rightval <- ordered_totals[right]
            leftgap <- quantile_point - left
            rightgap <- right - quantile_point
            quantile_val <- leftval * rightgap + rightval * leftgap 
        }

        # Finding the cut-off. Currently only accepting cells > cutoff,
        # but this could be changed to >= if desired.
        cutoff <- quantile_val / 10
        while (cutoff_point <= length(ordered_totals) && ordered_totals[cutoff_point] > cutoff) {
             cutoff_point <- cutoff_point + 1
        }

        # Number of cells is one less than the cutoff_point because we're 1-indexed.
        num_cells <- cutoff_point - 1

        loss[i] <- (i - num_cells)^2
    }

    which.min(loss)
}

Not tested on real data at all. Could be faster in C++, given that we have a src/. But maybe it's already fast enough.

DongzeHE · 2022-11-21T23:56:24Z

Hi all,

I will work on this during Thanksgiving. Thanks so much for pointing this out and coming up with the solutions!

Best,
Dongze

an-altosian mentioned this issue Jan 2, 2025

Implement the the order of magnitude algorithm from Cell Ranger to emptyDropsCellRanger #119

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add n.expected.cells estimation to emptyDropsCellRanger #88

Add n.expected.cells estimation to emptyDropsCellRanger #88

jashapiro commented Jul 13, 2022

LTLA commented Jul 13, 2022

jashapiro commented Jul 13, 2022

LTLA commented Jul 14, 2022

DongzeHE commented Nov 21, 2022

Add n.expected.cells estimation to emptyDropsCellRanger #88

Add n.expected.cells estimation to emptyDropsCellRanger #88

Comments

jashapiro commented Jul 13, 2022

LTLA commented Jul 13, 2022

jashapiro commented Jul 13, 2022

LTLA commented Jul 14, 2022

DongzeHE commented Nov 21, 2022