-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and expand onepass model #300
base: main
Are you sure you want to change the base?
Conversation
Welford covariance computation on all 33k genes (using a batch size of 10k cells) spikes to like 31GB memory usage on my laptop (eyeballing Activity Monitor). Just wanted to make a note of this as a ballpark figure. I am guessing that's why the github actions runner went OOM on 5495ce0. The CLI test now uses a For speed purposes, the rule seems to be "make the batch as big as you can accommodate in memory". |
Ensure batch is not empty in |
Closes #163
Closes #296
This is a refactor of onepass to make it more extensible. It implements the Welford algorithm for online variance calculation and it also implements a gene-gene covariance computation via an online algorithm similar to Welford (https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance subsection "Online"). The latter can be run using the raw data or ranks (which allows for a computation of gene-gene Spearman correlations).
Currently the Welford implementation is actually implemented in a different cellarium class, as is the covariance implementation. I thought this might be cleaner than having one huge class with more input arguments, but I'm open to opposing views. Also, the class heirarchy worked a lot better when Welford was a separate class, since Welford keeps track of different sufficient statistics than the naive/shifted algorithms. And the Welford-like gene-gene covariance keeps track of the same sufficient statistics as Welford (plus more).
Todo: