Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to read a h5mu object, a few errors popping up #9

Open
mikelove opened this issue Sep 26, 2024 · 3 comments
Open

Trying to read a h5mu object, a few errors popping up #9

mikelove opened this issue Sep 26, 2024 · 3 comments

Comments

@mikelove
Copy link

I'm working with a scCRISPR dataset from another group, I can open it with muon but not with MuData (1.8.0). Some details below. Thanks for taking a look!

wget -O inference_mudata.h5mu "https://dl.dropboxusercontent.com/scl/fi/u3hyg4gq9pfttpf6amlv5/inference_mudata.h5mu?rlkey=fa908coboty72rgqsg2bcndfu&st=985x9zhh&dl=1"

In python:

import muon as md
crispr_mu = md.read_h5mu('inference_mudata.h5mu')
>>> crispr_mu
MuData object with n_obs × n_vars = 32471 × 25184
  obs:	'cov1', 'batch'
  uns:	'pairs_to_test', 'test_results'
  3 modalities
    gene:	32471 x 24731
      obs:	'batch', 'cov1', 'batch_number', 'n_counts', 'log1p_n_genes_by_counts', 'total_gene_umis', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'percent_mito', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'num_expressed_genes', 'doublet_scores', 'predicted_doublets', 'doublet_info'
      var:	'symbol', 'mt', 'ribo', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'gene_chr', 'gene_start', 'gene_end'
    guide:	32471 x 441
      obs:	'batch', 'cov1', 'num_expressed_guides', 'batch_number', 'total_guide_umis'
      var:	'guide_id', 'sgRNA_ID', 'sgRNA_sequences', 'Target_name', 'chr', 'start', 'end', 'Set', 'intended_target_name', 'intended_target_chr', 'intended_target_start', 'intended_target_end', 'sequence', 'targeting'
      uns:	'capture_method', 'moi'
      layers:	'guide_assignment'
    hashing:	32471 x 12
      obs:	'batch', 'cov1', 'cluster_id', 'hto_type', 'hto_type_split'

In R:

> rhdf5::h5ls("~/Downloads/inference_mudata.h5mu", recursive = FALSE)
  group   name     otype dclass dim
0     /    mod H5I_GROUP
1     /    obs H5I_GROUP
2     /   obsm H5I_GROUP
3     / obsmap H5I_GROUP
4     /   obsp H5I_GROUP
5     /    uns H5I_GROUP
6     /    var H5I_GROUP
7     /   varm H5I_GROUP
8     / varmap H5I_GROUP
9     /   varp H5I_GROUP

I started to try to load with MuData (rename file to test.h5mu). The first error I came to is

 > dat <- readH5MU("test.h5mu")
 Error: Error in h5checktype(). The provided H5Identifier is not a dataset identifier.

This was coming from roughly here (I'm looking at 1.8.0 code):

https://github.com/ilia-kats/MuData/blob/master/R/read_h5mu.R#L60

This fails on this dataset because H5Dclose(col) won't work as col is of type group not dataset.

H5Iget_type(col)
 [1] "H5I_GROUP

I did a simple thing and just comment out H5Dclose.

The next error (which may not be relevant if my simple step is misguided):

 > dat <- readH5MU("test.h5mu")
 Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
   row names supplied are of the wrong length

This is coming from the first call to read_dataframe. There is a line do.call(data.frame, args = col_list) but we have:

 Browse[1]> str(col_list)
 List of 3
  $ cov1     : NULL
  $ batch    : NULL
  $ row.names: chr [1:32471(1d)] "CAGTAACCACTCTGTC_0" "CCTCTGACATCTATGG_0" "TATCTCAAGTTAACGA_0" "CTTACCGTCAGTGTTG_0" ...
 Browse[1]> columnorder
 [1] "cov1"  "batch"
!Browse[1]> group
 HDF5 GROUP
         name /obs
     filename

     name       otype dclass   dim
 0 _index H5I_DATASET STRING 32471
 1 batch  H5I_GROUP
 2 cov1   H5I_GROUP

So this cannot be coerced to a data.frame.

@ilia-kats
Copy link
Owner

Thanks for providing the sample file. This works for me with current Github master. We haven't had a Bioc release in a while. @gtca can you make a new release?

@mikelove
Copy link
Author

Any update on this? We are looking to produce MuData objects in a consortium and cross-language support would be ideal.

Again, I tried to debug above but I just don't know enough about the codebase to figure out the error.

@ilia-kats
Copy link
Owner

We are trying to add me as a maintainer of the Bioc package so we can get a new release out. Please be patient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants