adata.uns dataframe gets converted into numpy.ndarray when saving and loading h5ad #134

Munfred · 2019-04-10T03:50:17Z

Hello, I am trying to use adata.uns to store a dataframe with data of a different shape than adata.X. I am able to put a dataframe there and use it with no problem, however when I save the adata using adata.write('./test.h5ad') and then load it again with loaded_data = anndata.read_h5ad('./test.h5ad') the dataframe stored in adata.uns is loaded as a numpy array. This is a big problem for me because I lose the headers. Is this the intended behavior or a bug? If it's the intended behavior, the documentation should be made clearer. See screenshot below showing what I get.

The text was updated successfully, but these errors were encountered:

falexwolf · 2019-04-10T11:15:07Z

Problem is that we can't properly deal with dataframes in .uns. But we definitely want to support it. @flying-sheep, did you answer to a question about this already? Otherwise, it could be something for @Koncopd; it's not terribly much work, one just needs to immitate the way in which .obs and .var a written to the .h5ad.

It would help a lot, also in reworking the results of rank_genes_groups, which could then become dataframes...

LuckyMD · 2019-04-15T10:03:31Z

As soon as this is implemented I can also allow inplace=True for sc.tl.marker_gene_overlap() to store the results in .uns.

falexwolf · 2019-04-29T10:50:37Z

@Koncopd,

A dataframe should be an h5py group (you can make a class anndata.h5py.DataFrame), with attr "DataFrame" and values stored as a recarray and categories within that. This could be applied to .obs and .var (where it's already done like that, except for that the categories go into .uns, which we should stop doing...) and to any dataframe in .uns. A group that represents a DataFrame is not recursed through further (as we do for groups that represent sparse matrices).

Optimally, this would also directly translate to the zarr representation. I'd expect that we can abstract most of the formatting away and decide at a very late point whether to channel it to zarr or hdf5. @tomwhite, @ryan-williams: do we have "Groups" with attributes in zarr, too? How are you currently dealing with the SparseDataset we use for HDF5?

fidelram · 2019-04-30T09:10:49Z

Once this is in-place I can use it to improve the dot plots

tomwhite · 2019-04-30T09:18:30Z

@falexwolf the short answer is that we are not loading sparse single cell data from Zarr - it is all dense at the moment. I think there is a good case for storing data in a sparse representation in Zarr though.

falexwolf · 2019-04-30T12:24:10Z

@falexwolf the short answer is that we are not loading sparse single cell data from Zarr - it is all dense at the moment. I think there is a good case for storing data in a sparse representation in Zarr though.

OK, got it! We achieve much faster loading and writing using the sparse representations. That's something you'd also observe for zarr, I think.

ivirshup · 2019-09-10T07:57:32Z

Fixed by #167.

Koncopd self-assigned this Apr 10, 2019

falexwolf pinned this issue Apr 29, 2019

ivirshup mentioned this issue Jun 27, 2019

v0.7 #171

Closed

8 tasks

ivirshup closed this as completed Sep 10, 2019

ivirshup unpinned this issue Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adata.uns dataframe gets converted into numpy.ndarray when saving and loading h5ad #134

adata.uns dataframe gets converted into numpy.ndarray when saving and loading h5ad #134

Munfred commented Apr 10, 2019

falexwolf commented Apr 10, 2019

LuckyMD commented Apr 15, 2019

falexwolf commented Apr 29, 2019

fidelram commented Apr 30, 2019

tomwhite commented Apr 30, 2019

falexwolf commented Apr 30, 2019

ivirshup commented Sep 10, 2019

adata.uns dataframe gets converted into numpy.ndarray when saving and loading h5ad #134

adata.uns dataframe gets converted into numpy.ndarray when saving and loading h5ad #134

Comments

Munfred commented Apr 10, 2019

falexwolf commented Apr 10, 2019

LuckyMD commented Apr 15, 2019

falexwolf commented Apr 29, 2019

fidelram commented Apr 30, 2019

tomwhite commented Apr 30, 2019

falexwolf commented Apr 30, 2019

ivirshup commented Sep 10, 2019