Add AnnData dense matrix read support #146

bnprks · 2024-10-23T21:44:49Z

This addresses old issues #17 and #36 by adding support for reading the AnnData dense matrix format.

Changes:

Add support for reading dense AnnData matrices
Adjust test files for old AnnData versions to also cover obsm, varm, and dense matrices. Reduce the version numbers covered since v0.7.8 appeared to be identical to v0.7.0
Adjust how dimnames are recognized from AnnData files (match based on length rather than always assuming var is rows and obs is cols. This helps support other matrix shapes as found in obsm, or varp, etc.)
Split the 10x and AnnData import code into separate files since they are getting a bit long otherwise

The first commit is the main one to look at for review, the others are moving around code mostly.

As a side-effect, it seems the SingletonNumReader was dead code and got deleted.

immanuelazn · 2024-10-25T22:29:15Z

r/src/bpcells-cpp/matrixIterators/ImportMatrixAnnDataHDF5.cpp

-    }
+    std::vector<uint32_t> dims;
+    bool row_major, is_sparse;
+    readAnnDataDims(h5file, group, dims, row_major, is_sparse);


I like how this is now a much cleaner function. However, I think that using a void typed function to get the row_major status makes it a little bit unclear on how you obtained row_major status. In terms of readability, having to look into the function call on readAnnDataDims to see that your values changed as they're passed by reference is an extra step. Maybe we can just make it return the bool, or the tuple of bools?

bnprks · 2024-10-25T23:29:39Z

As discussed offline, another to-do item is filtering out explicit zeros while reading from a dense matrix

This required also changing openAnnDataMatrix() to return a unique_ptr, and open10xFeatureMatrix() got edited at the same time for consistency.

immanuelazn

Overall looks super good! Admittedly I had to read through a lot of highfive APIs, as well as through AnnData, to get a full understanding, but this looks super clear. Runs on my end well!

immanuelazn · 2024-10-30T00:06:45Z

r/src/bpcells-cpp/matrixIterators/ImportMatrixAnnDataHDF5.cpp

+// Read AnnData sparse matrix, with an implicit transpose to CSC format for
+// any data stored in CSR format.
+// Row/col names are handled as follows:
+//   - If row_names or col_names are provided, they are assumed to already have taken into


Would be good to mention this in the header file too

immanuelazn · 2024-10-30T00:36:35Z

r/src/bpcells-cpp/matrixIterators/FilterZeros.h

+// Filter out zero values from a MatrixLoader
+// This is useful when reading dense matrices that have many zero values,
+// or when performing operations that will cause new zeros to be created (e.g. multiplying a row by zero)
+template<typename T>


Very cool and elegant! Can already see this being used in a handful of derived matrixloader types

immanuelazn · 2024-10-30T00:36:56Z

r/src/bpcells-cpp/matrixIterators/FilterZeros.h

+    FilterZeros(std::unique_ptr<MatrixLoader<T>> &&loader) : MatrixLoaderWrapper<T>(std::move(loader)) {}
+
+    // Return false if there are no more entries to load
+    virtual bool load() {


shouldn't load() and capacity() just have the override keyword?

immanuelazn · 2024-10-30T00:48:59Z

r/src/bpcells-cpp/matrixIterators/ImportMatrixAnnDataHDF5.cpp

+    if (dims.size() != 1)
+        throw std::runtime_error(
+            std::string("readAnnDataDimname(): expected ") + axis +
+            " index to be 1-dimensional aray."


Suggested change

" index to be 1-dimensional aray."

" index to be 1-dimensional array."

immanuelazn · 2024-10-30T00:54:53Z

r/src/bpcells-cpp/matrixIterators/ImportMatrixHDF5.cpp

+        , cols_(d_.getDimensions()[1]) {}
+
+    // Return total number of integers in the reader
+    virtual uint64_t size() const { return rows_ * cols_; }


Using override keyword instead of virutal would probably make more sense

immanuelazn · 2024-10-30T02:08:30Z

r/tests/testthat/test-matrix_io.R

@@ -260,14 +264,24 @@ test_that("AnnData read backwards compatibility", {
    open_matrix_anndata_hdf5(file.path(dir, f), group="layers/transpose") %>%
      as("dgCMatrix") %>%
      expect_identical(ans)
+    open_matrix_anndata_hdf5(file.path(dir, f), group="layers/dense") %>%
+      as("dgCMatrix") %>%


I think it would be good to check for writing matrices that were originally read dense. I did a little bit of testing with write_matrix_hdf5(), and write_matrix_memory() and they appear to be working. But it never hurts to have a check in case of a regression!

Since as("dgCMatrix") calls write_matrix_memory() internally, I think this is already covered here

bnprks · 2024-10-30T19:44:06Z

r/src/matrix_io.cpp

+template <class T> List dims_matrix(std::unique_ptr<MatrixLoader<T>> &&mat, bool transpose) {
+    // This is only safe because we know the main `dims_matrix` function doesn't store any references to the object
+    return dims_matrix(std::move(*mat), transpose);
+}


I need to do a bit more research on the safety of this before merging. I was just running some small C++ tests and getting surprised by when destructors got called, so my mental model is not quite right

I think I have figured this out now. The solution I used was:

Make the base dims_matrix take an l-value reference, since that provides more ownership guarantees to the caller (i.e. the object will not get consumed by the function call)

Make a helper overload that takes an r-value reference. This provides fewer guarantees to the caller but is otherwise safe

Dereferencing a std::unique_ptr with * produces an l-value, which now can be passed directly to the l-value reference overload of dims_matrix

immanuelazn · 2024-11-25T19:50:40Z

No qualms about the new changes you added! Looks good Ben.

bnprks added 4 commits October 23, 2024 14:00

[cpp] Implement dense AnnData read support

fdf4de9

[cpp] Split 10x and AnnData import code files

fea0a16

As a side-effect, it seems the SingletonNumReader was dead code and got deleted.

[cpp] Move non-interface headers to cpp file

81008f6

[r] Update docs for AnnData dense support

0b00853

immanuelazn reviewed Oct 25, 2024

View reviewed changes

bnprks added 2 commits October 26, 2024 00:43

[cpp] Add FilterZeros

29b0a26

This required also changing openAnnDataMatrix() to return a unique_ptr, and open10xFeatureMatrix() got edited at the same time for consistency.

[cpp] Clean up readAnnDataDims() interface

6d0675a

immanuelazn approved these changes Oct 30, 2024

View reviewed changes

bnprks added 2 commits October 30, 2024 12:09

[cpp] Code review changes for dense matrix read

edd311a

[cpp] Remove intermediate code from test-matrixIterators

7bbd8e3

bnprks commented Oct 30, 2024

View reviewed changes

bnprks and others added 3 commits October 31, 2024 14:05

[r] Adjust dims_matrix parameter typing

330e664

Merge branch 'main' into bp/dense-anndata-read

3ed0bbf

Merge branch 'main' into bp/dense-anndata-read

eae371e

bnprks merged commit 290863d into main Nov 25, 2024
4 checks passed

bnprks deleted the bp/dense-anndata-read branch November 25, 2024 23:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AnnData dense matrix read support #146

Add AnnData dense matrix read support #146

bnprks commented Oct 23, 2024 •

edited

Loading

immanuelazn Oct 25, 2024

bnprks commented Oct 25, 2024

immanuelazn left a comment

immanuelazn Oct 30, 2024

immanuelazn Oct 30, 2024

immanuelazn Oct 30, 2024

immanuelazn Oct 30, 2024

immanuelazn Oct 30, 2024

immanuelazn Oct 30, 2024

bnprks Oct 30, 2024

bnprks Oct 30, 2024

bnprks Oct 31, 2024

immanuelazn commented Nov 25, 2024

	" index to be 1-dimensional aray."
	" index to be 1-dimensional array."

Add AnnData dense matrix read support #146

Add AnnData dense matrix read support #146

Conversation

bnprks commented Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

bnprks commented Oct 25, 2024

immanuelazn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

immanuelazn commented Nov 25, 2024

bnprks commented Oct 23, 2024 •

edited

Loading