-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Begin Outputting AnnData
file format
#151
Comments
Hi @ilan-gold, Indeed, this would be great! This would be trivial to do in pyroe, as that is in Python and has immediate access to all relevant scVerse packages. That said, it would be cool to avoid a conversion program at all and be able to output AnnData directly (or auto convert to it if desired from within Rust). Do you know if there is an AnnData crate for rust? Otherwise, I know there’s and HDF5 crate, though that I think that pulls in the (kinda flakey?) C-based dependency. —Rob |
Hi @rob-p I'm back from vacation now. Yes, there is some code in rust: https://github.com/kaizhang/anndata-rs/tree/main. Even though it does not provide installation instructions, it appears to be released as a package: https://docs.rs/anndata/latest/anndata/ @kaizhang Do you have any insight here? |
@ilan-gold @rob-p Please see here for an example of how to use anndata Rust package to output a cellxgene count matrix: https://github.com/regulatory-genomics/precellar/blob/a8d2fca438baf596a2be7cc6432364c7ef3534fe/precellar/src/transcript/quantification.rs#L80. Note the latest version use zstd compression under the hood. When you open the h5ad file in python, you may need to import hdf5plugin. |
@kaizhang Thanks. I guess with |
Thanks @kaizhang & @ilan-gold. I'm thinking an ideal place for such a conversion may be It's interesting because we use Anyway, I think I almost have a CSR to unspliced, spliced, ambiguous AnnData conversion function, but I am curious about one issue. I'd like the resulting AnnData object to have 3 layers ("unspliced", "spliced" and "ambiguous"). However, I don't have a need for a default "X" layer. However, if I set the layers without the X, I get the following error reading it into Python AnnData:
which presumably is because |
This might be an issue with AnnData. I'll look into it a bit |
Ok so here are some (maybe informative) code snippets: import anndata as ad
import numpy as np
ad.AnnData(X=None, layers={"foo": np.arange(4).reshape((2, 2)), "bar": np.arange(4).reshape((2, 2))}) # errors, probably erroneously import anndata as ad
import numpy as np
adata = ad.AnnData(X=np.arange(4).reshape((2, 2)), layers={"foo": np.arange(4).reshape((2, 2)), "bar": np.arange(4).reshape((2, 2))})
del adata.X
adata.write_h5ad("foo.h5ad")
ad.read_h5ad("foo.h5ad").X == None so if you're having trouble reading the data back in, it's tough to say without the file. So I stand by my original point, please send. |
Hi @ilan-gold, So here's my quick and dirty test to convert a USA matrix market format output from |
Nice, the goal here would be to do this directly from the in-memory representation in alevin-fry? Or would the intermediate |
I think both paths would be possible. My thought on the migration path and testing it out would be the following. Phase 1:
Phase 2:
These are my initial thoughts, but I'm open to discussion or feedback on any of these points! |
This would require custom code + handling via zarr/hdf5 but I think should be doable (if not so performant on zarr). But that's a long way off, and that's just a first impression. Otherwise I like the plan. Seems like a good way to gradually introduce the functionality. |
Awesome! Then I'll get started on this (i.e. grab the |
This should be fine as long as it's covered by a CRAN package e.g.: https://cran.r-project.org/web/packages/anndata/readme/README.html But GitHub only packages wouldn't work as dependencies. |
I definitely think putting this behind a flag is a good way to start, but yea, these |
Hi @ilan-gold, I've started implementing this in a library. That repo is [af-anndata]. In the mean time, if you want to take a look at a converted output, you can find one I made and uploaded here. This has layers for the spliced, unspliced and ambiguous counts (and corresponding (1) let me know if you have any suggestion on how to avoid the "empty We can continue discussion here or, if you'd prefer, we can pick it up in the |
Hi @rob-p I'd be still interested in what caused #151 (comment). You shouldn't have to make an empty CSR matrix so it would be good to understand why the file could not be read back in (maybe the same issue I raised in AnnData: scverse/anndata#1816, even likely). Or you could wait for me to fix that issue and then try it out later. If we need to go back and re-do, then that's what it is. |
I'm not sure if here or in
pyroe
this issue belongs, but I think (at least from my limited understanding), it would be cool if the sparse matrix output ofalevin
could beAnnData
on disk. It seems like https://docs.rs/anndata/latest/anndata/ would be a good candidate for thei/o
.It seems like
infer
andquant
would be the two candidates for them. This issue is mostly about opening a discussion about this feature, not being prescriptive or anything! I'm new to this ecosystem :)The text was updated successfully, but these errors were encountered: