Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writing bpcells matrices to Zarr stores #139

Open
Artur-man opened this issue Oct 11, 2024 · 8 comments
Open

writing bpcells matrices to Zarr stores #139

Artur-man opened this issue Oct 11, 2024 · 8 comments

Comments

@Artur-man
Copy link

Hi,

I was wondering if you guys are considering to add support for writing matrices to zarr stores ? It has a similar hierarchical structure to hdf5 and there currently exist implementations spanning some number of programming languages, including R and C++.

Here are some more zarr/R related links:
https://github.com/keller-mark/pizzarr/
https://github.com/BIMSBbioinfo/ZarrArray

@al2na
Copy link

al2na commented Oct 11, 2024

Yes, this is a good idea, zarr support is highly needed for single cell analysis.

@frenkiboy
Copy link

It would be amazing to have this feature!

@alexg9010
Copy link

Related to currently available Zarr implementations for C++ this overview table from the Z5 developer could help choose a backend.

@bnprks
Copy link
Owner

bnprks commented Oct 11, 2024

Hi folks, this is a very nice suggestion and happy to include it on our roadmap!

I have plans for an upcoming internal change that would make it possible to create separate R packages that can interoperate efficiently with BPCells at the C/C++ level. Then it would be very sensible to make a companion package that provides read/write support for zarr.

This would allow us to avoid complicating the BPCells build process -- it is already the source of a lot of user difficulty, and so if zarr support is a separate pacakge then only users that want zarr support will have to deal with any increased build complexity (e.g. also requiring cmake to be installed)

Does anyone have C++ familiarity that might be interested in getting mentored through adding a BPCells zarr support package in a few months? (I could ping you once the required changes are made in BPCells to make companion packages possible)

@Artur-man
Copy link
Author

Artur-man commented Oct 12, 2024

Dear Ben,

I would like to thank you for the prompt response.

We (specifically with @alexg9010) would really like to tackle this and get help from you guys to implement zarr backends. Both me and @alexg9010 have some (and getting better) C++ familiarity. We will be looking forward to your response then.

Also, I truly agree with your approach to having companion packages, which would be similar to DelayedArray backends, e.g. HDF5Array.

Note: It appears TenserStore was already used to provide zarr support for an C++ based image processing tools (ITK) https://forum.image.sc/t/c-zarr-library/70159/25.

@bnprks
Copy link
Owner

bnprks commented Oct 15, 2024

Hi @Artur-man and @alexg9010, that sounds great! I think we can get started with some zarr-only prototyping with C++, then once that's set we can integrate with BPCells. (If the prototyping goes quickly, there may be a bit of a gap while I update BPCells to allow proper interoperability)

The initial steps would be starting simple:

  1. Get tensorstore to build with CMake
  2. Read support prototype: given a zarr 1D array, read a contiguous slice of indices to memory
  3. Write support prototype: Write a new zarr 1D array piece-by-piece without having to hold the full array contents in memory (e.g. write the numbers 1 to 1e9 without using more than 100MB of RAM)

I think it might make sense for me to start up a BPCells Slack workspace for us to have easier back-and-forth, does that sound good to you?

@Artur-man
Copy link
Author

I agree with the Slack approach! We have already started to cook a small repo for building tensorstore, looking forward to it.

@bnprks
Copy link
Owner

bnprks commented Oct 16, 2024

That sounds great! I just sent @Artur-man and @alexg9010 a slack invite link via email -- let me know if you had any trouble receiving it.

For anyone else is interested in also getting involved in some BPCells-related coding just ask and I can add you as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants