-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat): experimental read_backed
method for zarr
+ hdf5
via read_dispatched
#947
Conversation
for more information, see https://pre-commit.ci
To fix error reporting, I've put the attempt to catch an error during IO on top of the `read_elem` method. Since the decorator is sometimes used on functions, I modified it to be able to handle the signature of both a method and a function. What's weird is that sometimes the decorator is being passed the arguments of a method, that has a name like a method, but is a function. So that still needs to be fixed.
for more information, see https://pre-commit.ci
4d60586
to
e480700
Compare
da5144d
to
cee7f6d
Compare
cd3737e
to
c3f6935
Compare
for more information, see https://pre-commit.ci
Before this can be merged/reviewed, there are several blocking PRs: |
for more information, see https://pre-commit.ci
…a into ig/read_remote_dispatched
Open question: If we want out-of-core
See pydata/xarray#1650 for the "real" way forward. This just mitigates the initial load of indices when calling |
be37320
to
9d53307
Compare
Note to self: we need the anndata/anndata/_core/anndata.py Lines 667 to 685 in 3e340e1
This is problematic because it means that accessing X immediately reads it into memory. This is why I moved the indexing on to the BaseCompressedSparseDataset class here.
|
No need as #1247 seems like it will be the way to go. |
Creates a new experimental
AnnDataBacked
class as well as a helperread_backed
method for reading of on-diskzarr
andhdf5
datasets lazily, with a focus on usage over the internet. The primary focus is less on optimizing for performance of analysis than on reading metadata/making fetching of data quick.Fixes #951 as well
Probably fixes #981 if an "experimental" PR can do that.
The main highlights of this PR in
experimental.read_backed
:xarray
for all dataframesAwkwardArray
sh5ad
andzarr
while also being compatible with both on-disk/remote storage (i.e., changing subelements should work if you want them to be e.g.,numpy
arrays although this isn't tested because the focus here is really on reading)to_memory
function on theAnnDataBacked
class that brings whatever you want locally for use inscanpy
including optionalexclude
keys for making the download faster by restricting it to exactly what you needOutside of this PR but included in the overall work:
AnnData
class beginning to define a sensible contract for new classes to build uponX
forzarr