Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adata.uns dataframe gets converted into numpy.ndarray when saving and loading h5ad #134

Closed
Munfred opened this issue Apr 10, 2019 · 7 comments
Assignees

Comments

@Munfred
Copy link

Munfred commented Apr 10, 2019

Hello, I am trying to use adata.uns to store a dataframe with data of a different shape than adata.X. I am able to put a dataframe there and use it with no problem, however when I save the adata using adata.write('./test.h5ad') and then load it again with loaded_data = anndata.read_h5ad('./test.h5ad') the dataframe stored in adata.uns is loaded as a numpy array. This is a big problem for me because I lose the headers. Is this the intended behavior or a bug? If it's the intended behavior, the documentation should be made clearer. See screenshot below showing what I get.

image

@falexwolf
Copy link
Member

Problem is that we can't properly deal with dataframes in .uns. But we definitely want to support it. @flying-sheep, did you answer to a question about this already? Otherwise, it could be something for @Koncopd; it's not terribly much work, one just needs to immitate the way in which .obs and .var a written to the .h5ad.

It would help a lot, also in reworking the results of rank_genes_groups, which could then become dataframes...

@Koncopd Koncopd self-assigned this Apr 10, 2019
@LuckyMD
Copy link

LuckyMD commented Apr 15, 2019

As soon as this is implemented I can also allow inplace=True for sc.tl.marker_gene_overlap() to store the results in .uns.

@falexwolf falexwolf pinned this issue Apr 29, 2019
@falexwolf
Copy link
Member

@Koncopd,

A dataframe should be an h5py group (you can make a class anndata.h5py.DataFrame), with attr "DataFrame" and values stored as a recarray and categories within that. This could be applied to .obs and .var (where it's already done like that, except for that the categories go into .uns, which we should stop doing...) and to any dataframe in .uns. A group that represents a DataFrame is not recursed through further (as we do for groups that represent sparse matrices).

Optimally, this would also directly translate to the zarr representation. I'd expect that we can abstract most of the formatting away and decide at a very late point whether to channel it to zarr or hdf5. @tomwhite, @ryan-williams: do we have "Groups" with attributes in zarr, too? How are you currently dealing with the SparseDataset we use for HDF5?

@fidelram
Copy link

Once this is in-place I can use it to improve the dot plots

@tomwhite
Copy link
Contributor

@falexwolf the short answer is that we are not loading sparse single cell data from Zarr - it is all dense at the moment. I think there is a good case for storing data in a sparse representation in Zarr though.

@falexwolf
Copy link
Member

@falexwolf the short answer is that we are not loading sparse single cell data from Zarr - it is all dense at the moment. I think there is a good case for storing data in a sparse representation in Zarr though.

OK, got it! We achieve much faster loading and writing using the sparse representations. That's something you'd also observe for zarr, I think.

@ivirshup ivirshup mentioned this issue Jun 27, 2019
8 tasks
@ivirshup
Copy link
Member

Fixed by #167.

@ivirshup ivirshup unpinned this issue Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants