Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C API wrapper for z5 #68

Open
constantinpape opened this issue Aug 1, 2018 · 21 comments
Open

C API wrapper for z5 #68

constantinpape opened this issue Aug 1, 2018 · 21 comments

Comments

@constantinpape
Copy link
Owner

The possibility in wrapping the z5 API for C came up in https://github.com/zarr-developers/zarr/issues/285#issuecomment-409438299 .

As far as I can tell, this would involve two main tasks:
Exposing the C-API of Dataset / DatasetTyped:
https://github.com/constantinpape/z5/blob/master/include/z5/dataset.hxx#L25

And implementing read/writeSubarray for C-arrays, similar to
https://github.com/constantinpape/z5/blob/master/include/z5/multiarray/xtensor_access.hxx

The first task should be straightforward.
Though I am not sure what's the best approach for wrapping all datatypes without templates.
I hope there would be a more clever way than redundant wrapper code for each individual datatype.

The second task will probably involve some pointer arithmetic, but maybe some code I have
written for fast copy in xtensor could be re-used:
https://github.com/constantinpape/z5/blob/master/include/z5/multiarray/xtensor_util.hxx#L45

@jakirkham
Copy link

jakirkham commented Aug 1, 2018

Thanks for opening this.

My naive thinking when suggesting we add C bindings is we coerce C pointers into something C++/xtensor can handle. Perhaps xt::adapt would work? Really whatever xtensor does for NumPy arrays should be pretty insightful.

In terms of handling C arrays, xnd (particularly the C portion as there is a Python portion as well) may be valuable. It is very young and under active development. Not aware of many other N-D options in C. Based on my brief search came across a few C implementations like this that were for 1-D arrays only and generally some combination of the following: unmaintained, GPL'd, or part of a much larger library.

Handling type parameterization in C basically is done one of a few ways (may be more that I'm missing):

  • Use void* (or char*) with length
  • Use macros
  • #define-style templates
  • Some template generator (e.g. m4, AutoGen, Tempita, Jinja2, etc.)
  • C11's __Generic

ref: https://softwareengineering.stackexchange.com/q/262571
ref: https://www.gnu.org/software/autogen/
ref: https://github.com/numpy/numpy/blob/master/tools/npy_tempita/__init__.py
ref: http://jinja.pocoo.org/docs/
ref: https://en.cppreference.com/w/c/language/generic

@constantinpape
Copy link
Owner Author

constantinpape commented Aug 1, 2018

Yes, using xt::adapt would probably work as well.
The solution would probably include passing a 1D C-array as well as the shape (long array + number of dimensons?) to the read/write function, get a xt::view to it with xt::adapt and then use xt::reshape to get a ND-view.

Thanks for the references on parameterization, I will have a closer look soon.

@jakirkham
Copy link

We might also want to pass in an enum for type selection. Then we would get something like this (roughly).

#include <cstddef>
#include <numeric>
#include <functional>
#include <vector>

#include "xtensor/xadapt.hpp"


enum DataType {
    ...
    INT,
    ...
};

void do_something(void* data, enum DataType type, size_t ndim, const size_t* shape)
{
    std::vector<std::size_t> shape_vec(shape, shape + ndim);
    std::size_t size = std::accumulate(
        shape_vec.begin(),
        shape_vec.end(),
        std::size_t(1),
        std::multiplies<std::size_t>()
    );
    
    switch (type) {
        ...
        case INT:
        {
            auto a = xt::adapt((int*)data, size, xt::no_ownership(), shape_vec);
        }
        break;
        ...
    }
}

@jakirkham
Copy link

jakirkham commented Aug 2, 2018

Also raised issues with xnd ( xnd-project/libxnd#31 ) and xtensor ( xtensor-stack/xtensor#1030 ) about being able to interoperate between the two libraries, which would provide another way to go between C and C++. Given they both seem to receive funding from QuantStack, expect this work would be prioritized by both projects.

Edit: Strikethrough due to an inaccuracy.

@davidbrochart
Copy link
Contributor

xtensor is funded by QuantStack but xnd is funded by Quansight.

@jakirkham
Copy link

Thanks @davidbrochart. Must have misread. Have edited the comment above accordingly.

@jakirkham
Copy link

Since you are here, @davidbrochart, do you have any thoughts on how we might modify the example above to use xnd instead? It would be nice to not have to pass through all of the array info piecewise, but am not familiar enough with xnd to say how.

@davidbrochart
Copy link
Contributor

I'm more familiar with the Python bindings than the C library, but the documentation about creating an xnd container is here. You will need to create a context and flags.
But I'm not sure what you want to do exactly.

@constantinpape
Copy link
Owner Author

@jakirkham I think your example is pretty much what we will need for read/write.
However, we will also need to pass a pointer to the Dataset.

@jakirkham
Copy link

As a side note, a C implementation would be a good in road for Fortran as well.

@clbarnes
Copy link
Contributor

I'm sure it's occurred to you already, but with this in mind it would make sense to put all of the attribute access stuff through C++ as well and have python just bind to that.

@constantinpape
Copy link
Owner Author

Yep, would make sense.
I did not write it that way when working on the python bindings, because the arguments in
the glue code need to be properly typed, and I didn't want to write functions for all possible attribute values.
However, it should be possible to bypass this by dumping the attributes to a json formatted string
and feeding this to the glue code.

@rsignell-usgs
Copy link

rsignell-usgs commented Sep 11, 2018

cc @WardF @DennisHeimbigner (Unidata NetCDF developers)

@WardF
Copy link

WardF commented Sep 11, 2018

This looks interesting; we're currently discussing our approach to adding native zarr functionality to the core netcdf C library, and having a pre-existing library we can link against would lighten the technical burden. On the other hand, once it's in the C library it might be a simple matter to expose it for more general usage. This is an interesting discussion, thanks for the tag @rsignell-usgs

@jakirkham
Copy link

Cool, glad to hear this would be useful for NetCDF. 😄

Do you have any thoughts on or suggestions for the proposal thus far, @WardF?

@WardF
Copy link

WardF commented Sep 24, 2018

Let me take a closer look and follow up with thoughts @jakirkham :)

@weilewei
Copy link
Contributor

weilewei commented Jun 25, 2019

Not sure if our work is helpful. We implemented c wrapper for z5 and it is just an experimental project. see here:
https://github.com/kmpaul/cz5test

Feel free to let me know if any improvement is needed.

@constantinpape
Copy link
Owner Author

@weilewei I had a very quick look and it looks promising.
I don't have much time this week, I will try to have a closer look in the next week.

Also, to give some context on the larger ecosystem here, there's currently discussion for a new version of the zarr spec going on, that will at some point unify the zarr and n5 formats, see issues and PRs in https://github.com/zarr-developers/zarr-specs.
Once the spec is there, the plan is to provide a C implementation of this spec, that would then also be used by z5 internally for the basic operations (I/O of chunks, compression).

Unfortunately, as far as I know, there is not a clear timeline for this yet and having a c wrapper for z5 will be useful in any case, just wanted to inform you that potentially there will some other things happen in this domain.

@weilewei
Copy link
Contributor

@constantinpape thanks for pointing me to the zarr specs discussion. Right, if you have a pure c implementation following the zarr spec than we do not need c api wrapper.

The C wrapper I wrote is just for my own experimental project. Its interface is currently not very perfect but most of file io functionalities are there and tested. I will improve it later. Any suggestion is welcome.

@constantinpape
Copy link
Owner Author

@weilewei @halehawk I forgot about your efforts in writing a C wrapper and was just reminded of this. Just wanted to say that this is a very valuable contribution, especially because the timeline for the native zarr c implementation is still unclear.
I would like to link to your effort in the README here. Is https://github.com/kmpaul/cz5test stil the relevant repository for this?

@weilewei
Copy link
Contributor

@constantinpape The link is correct. Thanks for bringing it up and linking our repos!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants