Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN in categorical column fails reading #6

Closed
votti opened this issue Sep 1, 2023 · 0 comments · Fixed by #7
Closed

NaN in categorical column fails reading #6

votti opened this issue Sep 1, 2023 · 0 comments · Fixed by #7

Comments

@votti
Copy link

votti commented Sep 1, 2023

I have an mudata/anndata dataset exported with anndata=0.7.8.

When trying to read it, I get the error reading the var:

Error in factor(as.integer(values), labels = labels_items): invalid 'labels'; length 4732 should be 1 or 4733
Traceback:

1. readH5AD(file)
2. read_modality(h5, backed)
3. read_with_index(h5autoclose(view & "var"))
4. read_dataframe(dataset)
5. lapply(columnorder, function(name) {
 .     col <- group & name
 .     values <- read_attribute(col)
 .     if (H5Aexists(col, "categories")) {
 .         attr <- H5Aopen(col, "categories")
 .         labels <- H5Aread(attr)
 .         if (!is(labels, "H5Ref")) {
 .             warning("found categories attribute for column ", 
 .                 name, ", but it is not a reference")
 .         }
 .         else {
 .             labels <- H5Rdereference(labels, h5loc = col)
 .             labels_items <- H5Dread(labels)
 .             n_labels <- length(unique(values))
 .             if (length(labels_items) > n_labels) {
 .                 labels_items <- labels_items[seq_len(n_labels)]
 .             }
 .             values <- factor(as.integer(values), labels = labels_items)
 .             H5Dclose(labels)
 .         }
 .         H5Aclose(attr)
 .     }
 .     H5Dclose(col)
 .     values
 . })
6. FUN(X[[i]], ...)
7. factor(as.integer(values), labels = labels_items)
8. stop(gettextf("invalid 'labels'; length %d should be 1 or %d", 
 .     nlab, length(levels)), domain = NA)

I found the reason was that the column contains NA that are represented as -1 in the categorical values but do not have a matching label in the categories.

Would you be interested in a PR with a fix?

votti added a commit to votti/MuData that referenced this issue Sep 1, 2023
Anndata 0.7.8 exports categories with NA with a value -1,
causing the reading of mudata/anndata to fail.

This adds support for NAs.
The convert_categoricals function is added as an internal function
in order to facilitate testing of this logic.

Fixes: ilia-kats#6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant