-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage from R #18
Comments
One option might be to use the zarr python package from R via reticulate. It would be good to try this out and find out if there are any interoperability issues. One way of doing this could be to try to run all the code examples from the zarr tutorial but from R via reticulate. Some benchmarking would probably also be useful, to identify any areas where performance is affected by having to move or translate data between R and python. If it is a workable option, it might then be cool to write a version of the zarr tutorial but for R users, which could be based off the current zarr python tutorial but include any specific information that R users might need to be aware of. |
Another option could be to write R bindings for the Z5 C++ library, e.g., via RCPP. This would be more work but might provide opportunities for better performance by avoiding any unnecessary data transformations or copies required when using reticulate. |
A technical point of interest, in R arrays use column-major (Fortran) memory layout. Zarr provides the option to use either row (C) or column (F) memory layout for data within chunks, and the same layout is used when retrieving data for all or part of a zarr array into a numpy array. E.g.: In [20]: z = zarr.zeros((100, 100), order='F')
In [21]: a = z[:]
In [22]: a.flags
Out[22]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False So when using zarr from R, using |
I could be wrong, but from my understanding, isn't zarr is more of a software that uses key-value and chunked-compressed mechanism to provide efficient on-disk array solution? That is to say, being able to load the zaar data in R is far from having a full-fledged and equally performed R package that can access zarr backend as efficient as the current python lib? (even if the R binding for Z5 lib is implemented). |
Hi Mike, yes I imagine that having a native implementation would be more
powerful than using reticulate, although I have not tried it yet and so
don't have a clear view of the limitations.
FWIW there are basically 3 main components in the Zarr internal
architecture, each with a simple API.
The storage module contains classes which expose a key-value interface
where keys are ASCII strings and values are blobs. Minimum would be an
implementation of this interface for the filesystem, allowing you to read
Zarr data stored on disk. Other possible implementations include cloud
object stores etc.
The codecs module contains classes that expose a encode/decode interface,
and includes main compressors like blosc, gzip etc. For the Python
implementation that's in a separate package called numcodecs.
Then the core module provides the translation between an array-like
interface and the underlying management, encoding and storage of chunks.
There is also a hierarchy module which deals with creating and accessing
groups etc.
FWIW there's a bit more info on the architecture in the ESIP talk I gave
here: http://alimanfoo.github.io/2018/04/12/zarr-tech-dive.html
To get a basic working implementation was actually less work than you might
imagine, but as always the devil is in the detail.
…On Mon, 17 Sep 2018, 22:43 Mike Jiang, ***@***.***> wrote:
I could be wrong, but from my understanding, isn't zarr is more of a
software that uses key-value and chunked-compressed mechanism to provide
efficient on-disk array solution? Being able to load the zaar data in R is
far from having a full-fledged and equally performed R package that can
access zarr backend as efficient as the current python lib? (even if the R
binding for Z5 lib is implemented).
Can you provide more insights regarding to the amount of software
engineering efforts required to translate zarr to R without reticulate?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/zarr-developers/zarr/issues/279#issuecomment-422181904>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QtMsBLSda7x7OPExhZySAycawfbrks5ucBeIgaJpZM4VZRpl>
.
|
I took a stab at wrapping z5 from R, it currently compiles, but it is not functional yet and there is still quite a lot of work to do, so don't judge me :-). |
Let me know if you have any questions or need any help from the z5 site. |
Thanks a lot for sharing, great to hear about this!
…On Tue, 12 Nov 2019, 08:15 Guido Kraemer, ***@***.***> wrote:
I took a stab at wrapping z5 from R, it currently compiles, but it is not
functional yet and there is still quite a lot of work to do, so don't judge
me :-).
I am sharing this to avoid duplicated efforts, anyone who wants can join
development
https://github.com/gdkrmr/zarr-R
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#18?email_source=notifications&email_token=AAFLYQSJBZX5M6DZ3YOJXO3QTJQ2JA5CNFSM4H5MB6Q2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDZM22Q#issuecomment-552783210>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFLYQRCL5NXXQDTAOGWDCTQTJQ2JANCNFSM4H5MB6QQ>
.
|
Thanks, what is the ETA for v2.0.0? |
The API redesign is done, I just need to test it a bit more. Anyway, I think I will release 2.0.0 without S3 or other cloud backends and push this to 2.1.0. |
|
Just fyi, I don't support bool right now in z5.
Interesting, for the python bindings the .so is quite a bit smaller, |
I don't think we are doing anything special. Though could imagine one implementing a bit packing codec. Maybe there are some compiler flags that can help? |
Probably yes. @gdkrmr What operating system are you using and which compiler? |
On Mon, 27 Jan 2020, 21:05 Constantin Pape, ***@***.***> wrote:
* I still need to think how to deal with the other data types (e.g. Booleans).
Just fyi, I don't support bool right now in z5.
@alimanfoo <https://github.com/alimanfoo> @jakirkham
<https://github.com/jakirkham> Are there any optimisations when zarr
stores bools or is it storing a bool as one byte?
Same as numpy, bool as one byte.
|
R stores bools as bytes (EDIT: no, they are stored as int32), because there is also a
Ubuntu 16.04 and I have to use the R build system, which uses Makefiles.
I can get the size of the EDIT: R stores rlogicals as int32, not uint8 |
What is the state of Zarr support in R? I haven't looked at my package for a while and wonder if someone else has done some work on this in the meanwhile or is planning to work on this? |
I'm late to the party, but Googling most permutations of "zarr for R" gives this thread as the top hit, then @gdkrmr's repo, and Bioconductor's ZarrExperiment (I'll get to this later). So I'd guess your stuff is still the best we've got right now. If you're planning to keep working on your zarr R package, I'd be willing to test it out on some genome-scale data. I've been eyeing some alternatives to HDF5 for a while and would be very interested in building on top of whatever you make. Our current approach in ZarrExperiment just does the simple thing of dispatching to the Python library via reticulate. A native port would be much preferred if it is feasible. If your package gets more mature, we would use it to create a (Maybe you should call the package zarrr, ho ho ho.) |
Same here. I'll be teaching a workshop for R users soon and I was wondering about zarr support. So far I got it via nczarr. See the last cell of this notebook. But it would be nice to add alternatives that don't require a netcdf installation. |
Would it help to get zarrrrrr interested parties together at the next community meeting (May 5th) to discuss a path forward? From my side, I'd love to see one (or more?) R implementation in https://github.com/zarr-developers/zarr_implementations/ |
Count me in. As a regular R user, this is something I've been thinking about recently. I'd favour the C++/Rcpp path over the reticulate approach as I've had issues with reticulate before (in my experience, R doesn't always play well with the various python envs/conda). |
Maybe we could provide R bindings of xtensor-zarr? We already do that for xtensor, and there exists an R package for xtensor already. We could improve this package so that it allows Zarr access, and users could use the same package for array processing. The package would then be equivalent to something like Zarr + NumPy. |
the |
This sounds great! I started an extremely rough pure R function for producing a single Zarr chunk from an R matrix here https://github.com/vitessce/vitessce-r/blob/keller-mark/zarr/R/zarr.R#L215 in case anyone is interested. Unfortunately I cannot attend at 2pm eastern time on May 5th due to a conflict but perhaps @ilan-gold @manzt @th789 @mccalluc are interested |
I will try to attend but cannot make any promises. |
Looks like the time slot didn't work out for R folks. No worries. Note that the 19th is cancelled; we'll be back on the regular zoom on the 2nd though. If a different time slot would be better, feel free to say the word. |
In a way that already works. See the last cell of https://nbviewer.jupyter.org/gist/ocefpaf/4a078b19db4fd5507d2d21691abaa689 But nczarr is not exactly the same as zarr. I'm not well versed in the details but maybe a core zarr (c/c++/rust, whatever) that we can wrap in Python and R is still needed? |
@ocefpaf : I only know what's on the docs and what I've tested on the CLI, but my understanding was that nczarr has a mode to work with pure Zarr that may be of interest. I'd defer to @DennisHeimbigner whether a portion of the library could be used as a core. |
Josh is correct. We support pure zarr read/write, so as long as you are willing to live
|
BTW you could try this experiment with R wrapping netcdf-4.8.0
This should create a directory called "simple.zarr" that contains a pure zarr container. |
I have a beginner's question to opening zarr files with I have built library(ncdf4)
# open file
ncin <- nc_open(
"file:///Users/me/image.ome.zarr#mode=nczarr,zarr"
)
ncin
# Error in R_nc4_inq: NetCDF: Invalid argument
# Error in nc_get_grp_info(gids[ib], root_group$fqgn, format) :
# nc_get_grp_info: R_nc4_inq returned error on group id 524289 |
I couldn't get it to work either, see Unidata/netcdf-c#1982 |
A couple of things.
|
BTW what operating system are you using? |
It is a a multiscale OME-zarr where the image is in the path > ncdump -h "file:///Users/me/ccidImage.ome.zarr#mode=nczarr,zarr"
ncdump: file:///Users/me/ccidImage.ome.zarr#mode=nczarr,zarr: No such file or directory
> ncdump -h "file:///Users/me/ccidImage.ome.zarr/0/0#mode=nczarr,zarr"
netcdf \0 {
} nc-config --all
This netCDF 4.8.1-development has been built with the following features:
--cc -> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
--cflags -> -I/usr/local/include
--libs -> -L/usr/local/lib -lnetcdf
--static -> -lhdf5_hl -lhdf5 -lsz -lz -ldl -lm -lsz -lcurl -lzip
--has-c++ -> no
--cxx ->
--has-c++4 -> yes
--cxx4 -> /usr/local/Homebrew/Library/Homebrew/shims/mac/super/clang++
--cxx4flags -> -I/usr/local/Cellar/osgeo-netcdf/4.7.4/include
--cxx4libs -> -L/usr/local/Cellar/osgeo-netcdf/4.7.4/lib -lnetcdf-cxx4 -lnetcdf
--has-fortran -> yes
--fc -> /usr/local/bin/gfortran
--fflags -> /usr/local/Cellar/osgeo-netcdf/4.7.4/include
--flibs -> -L/usr/local/Cellar/osgeo-netcdf/4.7.4/lib
--has-f90 -> TRUE
--has-f03 -> FALSE
--has-dap -> yes
--has-dap2 -> yes
--has-dap4 -> yes
--has-nc2 -> yes
--has-nc4 -> yes
--has-hdf5 -> yes
--has-hdf4 -> no
--has-logging -> no
--has-pnetcdf -> no
--has-szlib -> yes
--has-cdf5 -> yes
--has-parallel4 -> no
--has-parallel -> no
--has-nczarr -> yes
--prefix -> /usr/local
--includedir -> /usr/local/include
--libdir -> /usr/local/lib
--version -> netCDF 4.8.1-development |
Could this be related to the "dimension_separator" metadata, @DennisHeimbigner ? @schienstockd , can you show us the content of |
I think I see the problem. I use a heuristic to break a key into the variable key |
{
"chunks" : [
1,
1,
1,
512,
512
],
"compressor" : {
"clevel" : 5,
"blocksize" : 0,
"shuffle" : 1,
"cname" : "lz4",
"id" : "blosc"
},
"dtype" : ">u2",
"fill_value" : 0,
"filters" : null,
"order" : "C",
"shape" : [
180,
4,
8,
512,
512
],
"zarr_format" : 2,
"dimension_separator" : "/"
} I am not sure where the '0' variable comes from .. I used |
Looks like there is a very rough Zarr implementation in R https://github.com/keller-mark/pizzarr cc @keller-mark (hopefully I've clarified that correctly; please feel free to correct me if not) |
Yes very rough indeed. Of course open to contributions or more detailed feature requests / issues. |
See discussion post under zarr-developers/zarr-python#1088 cc: @mike-lawrence |
The |
I think stars only provides read access, no write. |
Seems to be solid progress here |
The Rarr package is now on Bioconductor. The repository is here. It's written in C and writing is supported although for now limited to double and string types. |
Cool! I always forget to check bioconductr for packages 🤦♂️ |
Hi all, update on pizzarr: some things are working now!
I have updated the docs a bit, with a simple OME-NGFF demo at https://keller-mark.github.io/pizzarr/articles/ome-ngff.html |
Thanks for working on Pizzarr and updating us, @keller-mark. May I add this to our website (https://zarr.dev/implementations/)? |
@MSanKeys963 Yes feel free to add! Thanks! |
It would be great to be able to use zarr format data from R. This issue is intended for discussing options for enabling/supporting usage from R.
The text was updated successfully, but these errors were encountered: