NCZarr does not support reading of many zarr files #2474

dzenanz · 2022-08-08T18:47:21Z

I am trying to read sample zarr files provided here: https://github.com/ome/ome-ngff-prototypes#data-availability

dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tcyx.ome.zarr#mode=nczarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tcyx.ome.zarr#mode=nczarr,file: NetCDF: NCZarr error
dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tcyx.ome.zarr#mode=nczarr
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tcyx.ome.zarr#mode=nczarr: NetCDF: NCZarr error
dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tcyx.ome.zarr#mode=zarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tcyx.ome.zarr#mode=zarr,file: NetCDF: NCZarr error
dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr#mode=zarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr#mode=zarr,file: NetCDF: NCZarr error
dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tczyx.ome.zarr#mode=zarr,file
free(): double free detected in tcache 2
Aborted (core dumped)
dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tyx.ome.zarr#mode=zarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tyx.ome.zarr#mode=zarr,file: NetCDF: NCZarr error
dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/yx.ome.zarr#mode=zarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/yx.ome.zarr#mode=zarr,file: NetCDF: NCZarr error
dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/zyx.ome.zarr#mode=zarr,file
free(): double free detected in tcache 2
Aborted (core dumped)
dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/multi-image.ome.zarr#mode=zarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/multi-image.ome.zarr#mode=zarr,file: NetCDF: NCZarr error
dzenan@corista:~/tester$ ncdump -h https://s3.embl.de/i2k-2020/ngff-example-data/v0.4#mode=zarr,file
ncdump: https://s3.embl.de/i2k-2020/ngff-example-data/v0.4#mode=zarr,file: NetCDF: Malformed URL
dzenan@corista:~/tester$ ncdump -h https://s3.embl.de/i2k-2020/ngff-example-data/v0.4
syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <?xml^ version="1.0" encoding="UTF-8"?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>ngff-example-data/v0.4.dds</Key><BucketName>i2k-2020</BucketName><Resource>/i2k-2020/ngff-example-data/v0.4.dds</Resource><RequestId>17096CA8F2A3955C</RequestId><HostId>083e6bd6-5f67-4d7a-bf85-13f52b390395</HostId></Error>
ncdump: https://s3.embl.de/i2k-2020/ngff-example-data/v0.4: NetCDF: file not found
dzenan@corista:~/tester$ ncdump -h s3:https://s3.embl.de/i2k-2020/ngff-example-data/v0.4
ncdump: s3:https://s3.embl.de/i2k-2020/ngff-example-data/v0.4: No such file or directory
dzenan@corista:~/tester$ ncdump -h s3://s3.embl.de/i2k-2020/ngff-example-data/v0.4
ncdump: s3://s3.embl.de/i2k-2020/ngff-example-data/v0.4: NetCDF: Attempt to use feature that was not turned on when netCDF was built.

Am I doing something wrong? I was under impression that NCZarr would read most zarr files. What are the offending features in these examples which break NCZarr?

Ubuntu 22.04.1 LTS, GCC 11.2.0, libnetcdf-dev Version: 1:4.8.1-1.

The text was updated successfully, but these errors were encountered:

dzenanz · 2022-08-10T13:39:39Z

@DennisHeimbigner do you have some insight?

DennisHeimbigner · 2022-08-10T17:33:25Z

Sorry for the delay. It appears to be related to the use of '/' as the
dimension separator. I was trying to track down exactly what i happening.

DennisHeimbigner · 2022-08-10T19:11:25Z

I was wrong. The problem is this (its complicated).

array "s0" meta-data specifies this:
- shape is [4,930,1024]
- _ARRAY_DIMENSIONS are ["c","y","x"]
array "s1" meta-data specifies this:
- shape is [4,465,512]
- _ARRAY_DIMENSIONS are ["c","y","x"]

Since both are in the same group (the root group), this is an inconsistency
because one variable says y=930 and the other says y=465.
You might contact the Xarray community to see how this contradiction
should be resolved.

dzenanz · 2022-08-10T19:26:59Z

OME NGFF specifies that "Metadata about an image can be found under the "multiscales" key in the group-level metadata." As far as I can tell, multiple arrays are designed to have different dimensions to represent different levels of the image pyramid. @joshmoore Am I interpreting this correctly?

This might be contrary to NCZarr's constraints? Is there a way to reconcile this? Or is NCZarr a bad choice for Zarr-backend for NGFF implementation?

DennisHeimbigner · 2022-08-10T19:42:38Z

I do not know anything about the OME standard, but why do you think
it is generating legal Zarr Version 2 files, never mind NCZarr.

joshmoore · 2022-08-10T19:42:49Z

Since both are in the same group (the root group), this is an inconsistency
because one variable says y=930 and the other says y=465.
You might contact the Xarray community to see how this contradiction
should be resolved.

@DennisHeimbigner: but that should only be the case for mode=nczarr, no? There's no such restriction in Zarr itself.

As a side note, @dzenanz, ome/ngff#114 is intended to play nicely with xarray and nczarr.

DennisHeimbigner · 2022-08-10T19:47:11Z

It is actually an XArray problem since the _ARRAY_DIMENSIONS attribute
is not recognized in Zarr V2 (or has that changed?).
But it should be the case that the names assigned to shapes
should be consistent across all uses of those dimensions names, correct?

joshmoore · 2022-08-10T19:53:12Z

It is actually an XArray problem since the _ARRAY_DIMENSIONS attribute
is not recognized in Zarr V2 (or has that changed?)

It's definitely true that Zarr v2 is agnostic to _ARRAY_DIMENIONS, but why should that impact mode=zarr filesets?

But it should be the case that the names assigned to shapes
should be consistent across all uses of those dimensions names, correct?

Not in Zarr.

DennisHeimbigner · 2022-08-10T20:00:47Z

I was speaking more about nczarr and Xarray, than Zarr, because Zarr
has no dimension names at all.
Perhaps the interesting question is: what happens when that file
is read by the Xarray package? What values does Xarray assign to
the dimension names?

dzenanz · 2022-08-10T20:02:11Z

I assume that the highest resolution is used by default in Xarray.

DennisHeimbigner · 2022-08-10T20:03:01Z

In any case, try changing the access url to include the following:
".......#mode=zarr,noxarray,ZZZ"
where ZZZ is either file or zip or s3 depending on what storage format
you are reading.

DennisHeimbigner · 2022-08-10T20:04:49Z

I assume that the highest resolution is used by default in Xarray.

I would be surprised if that was so. It would be seen as violating shape constraints,
I would think.

dzenanz · 2022-08-10T20:06:28Z

dzenan@corista:~/tester$ /usr/local/bin/ncdump -h file:///media/xeonatorC/Misc/Tester/CBCT_crop1ms.zarr#mode=zarr,noxarray,file
netcdf CBCT_crop1ms {
dimensions:
	_zdim_56 = 56 ;
	_zdim_63 = 63 ;
	_zdim_67 = 67 ;
	_zdim_28 = 28 ;
	_zdim_31 = 31 ;
	_zdim_33 = 33 ;
	_zdim_14 = 14 ;
	_zdim_15 = 15 ;
	_zdim_16 = 16 ;

group: scale0 {
NetCDF: internal library error; Please contact Unidata support
Location: file ; line 1661

dzenanz · 2022-08-10T20:17:36Z

@DennisHeimbigner you are right. xarray.open_zarr() yields:

<xarray.Dataset>
Dimensions:  ()
Data variables:
    *empty*

DennisHeimbigner · 2022-08-10T20:34:53Z

I would suggest two changes to the OME format:

make the dimension-name <-> shape mapping be consistent
Change the "direction" attribute from this:
"direction": [ [ 1.0, 0.0, 0.0 ], [ 0.0, 1.0, 0.0 ], [ 0.0, 0.0, 1.0 ] ]
to something like this.
"direction": {"x": [1.0,0.0,0.0], "y": [0.0, 1.0, 0.0], "z": [0.0, 0.0, 1.0]}

There may be other attributes that also should be changed.

joshmoore · 2022-08-11T01:19:47Z

I would suggest two changes to the OME format:

Suggestion noted, thanks, @DennisHeimbigner, but I don't think we should open that conversation here.

Leaving nczarr, s3, and http out of the picture for a moment, I'm still struggling to understand whether you think these file-based pure-zarr examples from the original description should be covered by netcdf-c, @DennisHeimbigner:

dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tcyx.ome.zarr#mode=zarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tcyx.ome.zarr#mode=zarr,file: NetCDF: NCZarr error

dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr#mode=zarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr#mode=zarr,file: NetCDF: NCZarr error

dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tczyx.ome.zarr#mode=zarr,file
free(): double free detected in tcache 2
Aborted (core dumped)

dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tyx.ome.zarr#mode=zarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/tyx.ome.zarr#mode=zarr,file: NetCDF: NCZarr error

dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/yx.ome.zarr#mode=zarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/yx.ome.zarr#mode=zarr,file: NetCDF: NCZarr error

dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/zyx.ome.zarr#mode=zarr,file
free(): double free detected in tcache 2
Aborted (core dumped)

dzenan@corista:~/tester$ ncdump -h file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/multi-image.ome.zarr#mode=zarr,file
ncdump: file:///media/xeonatorC/Dev/ITKIOOMEZarrNGFF/v0.4/multi-image.ome.zarr#mode=zarr,file: NetCDF: NCZarr error

In my mind, they should especially if we are pointing the community to this as the C implementation.

DennisHeimbigner · 2022-08-11T02:27:35Z

....these file-based pure-zarr examples from the original description should be covered by netcdf-c,

Speaking for myself, I say no. The fact that they cannot be read by Xarray
makes it clear that having an inconsistent _ARRAY_DIMENSION <-> shape
mapping is just wrong, and needs to be fixed.

As for the case of

"direction": [ [ 1.0, 0.0, 0.0 ], [ 0.0, 1.0, 0.0 ], [ 0.0, 0.0, 1.0 ] ]

that is more a matter of opinion and I would really think that using a dictionary
instead of an array of arrays provides for more meaningful metadata.
GDAL (https://gdal.org/drivers/raster/zarr.html), for instance, has non-standard attribute values whose value is a dictionary.
I think that should be the convention for non-standard attributes.

* re: Unidata#2278 * re: Unidata#2485 * re: Unidata#2474 This PR subsumes PR Unidata#2278. Actually is a bit an omnibus covering several issues. ## PR Unidata#2278 Add support for the Zarr string type. Zarr strings are restricted currently to be of fixed size. The primary issue to be addressed is to provide a way for user to specify the size of the fixed length strings. This is handled by providing the following new attributes special: 1. **_nczarr_default_maxstrlen** — This is an attribute of the root group. It specifies the default maximum string length for string types. If not specified, then it has the value of 64 characters. 2. **_nczarr_maxstrlen** — This is a per-variable attribute. It specifies the maximum string length for the string type associated with the variable. If not specified, then it is assigned the value of **_nczarr_default_maxstrlen**. This PR also requires some hacking to handle the existing netcdf-c NC_CHAR type, which does not exist in zarr. The goal was to choose numpy types for both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that if a pure zarr implementation read them, it would still work and an NC_CHAR type would be handled by zarr as a string of length 1. For writing variables and NCZarr attributes, the type mapping is as follows: * "|S1" for NC_CHAR. * ">S1" for NC_STRING && MAXSTRLEN==1 * ">Sn" for NC_STRING && MAXSTRLEN==n Note that it is a bit of a hack to use endianness, but it should be ok since for string/char, the endianness has no meaning. For reading attributes with pure zarr (i.e. with no nczarr atribute types defined), they will always be interpreted as of type NC_CHAR. ## Issue: Unidata#2474 This PR partly fixes this issue because it provided more comprehensive support for Zarr attributes that are JSON valued expressions. This PR still does not address the problem in that issue where the _ARRAY_DIMENSION attribute is incorrectly set. Than can only be fixed by the creator of the datasets. ## Issue: Unidata#2485 This PR also fixes the scalar failure shown in this issue. It generally cleans up scalar handling. It also adds a note to the documentation describing that NCZarr supports scalars while Zarr does not and also how scalar interoperability is achieved. ## Misc. Other Changes 1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names. 2. Cleanup my too-clever-by-half handling of scalars in libnczarr.

DennisHeimbigner · 2022-08-28T04:02:44Z

Partly fixed by #2492

joshmoore · 2022-09-05T14:22:36Z

Speaking for myself, I say no. The fact that they cannot be read by Xarray
makes it clear that having an inconsistent _ARRAY_DIMENSION <-> shape
mapping is just wrong, and needs to be fixed.

@DennisHeimbigner, to clarify: is the intent for netcdf-c to be considered a Zarr library (as opposed to an nczarr library)?

Partly fixed by #2492

@dzenanz: will you have a chance to test if the partial fix works for you?

dzenanz · 2022-09-05T14:47:04Z

I am finishing up a grant proposal today and tomorrow, I plan to test #2492 later in the week.

DennisHeimbigner · 2022-09-05T19:01:16Z

,,,to clarify: is the intent for netcdf-c to be considered a Zarr library (as opposed to an nczarr library)?

The primary goal is to read/write netcdf-4 stored in Zarr format datasets.
But since netcdf-4 does not exactly map to Zarr v2, it is necessary
to add optional extensions to Zarr to support netcdf-4 concepts.

On the other hand, we do not want to cut ourselves off from pure Zarr
datasets. So, a sub-goal is to support mapping a subset of netcdf-4
to pure Zarr. This sub-goal allows us to access pure Zarr datasets thru
the netCDF API.

joshmoore · 2022-09-06T06:24:12Z

Thanks for the explanation, @DennisHeimbigner. I definitely understand the primary goal. But is there also a sub-goal of mapping all pure Zarr datasets to some subset of netcdf-4?

DennisHeimbigner · 2022-09-06T18:11:50Z

...is there also a sub-goal of mapping all pure Zarr datasets to some subset of netcdf-4?

Short answer is yes. For example, I have just added support for Zarr fixed length
string type. There are a couple of other things I will be adding when there
specification is stable: .zmetadata (aggregate metadata) and sharding.
If there is something you would like to see that I have missed, then let me know.

joshmoore · 2022-09-06T19:38:26Z

I guess I defer there to @dzenanz' testing since in #2474 (comment) there appear to be Zarr datasets that aren't openable. If it would help to copy them somewhere for testing, happy to do so.

dzenanz · 2022-09-08T16:03:40Z

Trying to read C:\Dev\ITKIOOMEZarrNGFF\v0.4\cyx.ome.zarr with updated version of netCDF still produces the same error: NetCDF: NCZarr error when invoking nc_open. Same happens for the other files there (yx.ome.zarr and multi-image.ome.zarr). I am using a slightly modified netCDF: https://github.com/dzenanz/netcdf-c/tree/itkSpecific.

DennisHeimbigner · 2022-09-08T16:14:19Z

That is because the OME use of _ARRAY_DIMENSION is still wrong and I
cannot fix that particular problem.

dzenanz · 2022-09-08T20:08:30Z

Removing either _ARRAY_DIMENSION attributes or leaving just one scale gets me further: NetCDF: Filter error: undefined filter encountered. That is understandable, since the array has:

"compressor": {
    "blocksize": 0,
    "clevel": 5,
    "cname": "lz4",
    "id": "blosc",
    "shuffle": 1
}

DennisHeimbigner · 2022-09-08T23:02:13Z

How did you install netcdf-c? Are you building it yourself?

dzenanz · 2022-09-09T12:50:03Z

To have the latest version, I have to build it myself. My build setup is here: InsightSoftwareConsortium/ITKIOOMEZarrNGFF#6
Here, I am trying to add zip support to the build: dzenanz/ITKIOOMEZarrNGFF@0bbea25
the current stumbling block is here: dzenanz/ITKIOOMEZarrNGFF@0bbea25#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aR135-R137

DennisHeimbigner · 2022-09-09T17:09:26Z

So it looks like you are building with your own CMakeLists.txt and not
using the one provided by netcdf, correct?

dzenanz · 2022-09-09T19:00:28Z

I modified netCDF's CMakeLists.txt, in order to point to HDF5 bundled with ITK, and to avoid some duplicate targets with other libraries I use together.

DennisHeimbigner mentioned this issue Aug 10, 2022

NCZarr does not read images produced by multiscale-spatial-image #2476

Closed

WardF assigned WardF and DennisHeimbigner Aug 19, 2022

WardF added this to the 4.9.1 milestone Aug 19, 2022

WardF added this to netCDF-C v4.9.1 Aug 19, 2022

WardF moved this to Todo in netCDF-C v4.9.1 Aug 19, 2022

DennisHeimbigner mentioned this issue Aug 28, 2022

Add support for Zarr string type to NCZarr #2492

Merged

joshmoore mentioned this issue Sep 9, 2022

Propose resolution _groups_ for xarray support ome/ngff#114

Draft

dzenanz mentioned this issue Dec 20, 2022

Clarifying questions regarding building xtensor-zarr xtensor-stack/xtensor-zarr#65

Open

WardF modified the milestones: 4.9.1, 4.9.2 Feb 13, 2023

WardF modified the milestones: 4.9.2, 4.9.3 May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCZarr does not support reading of many zarr files #2474

NCZarr does not support reading of many zarr files #2474

dzenanz commented Aug 8, 2022

dzenanz commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

dzenanz commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

joshmoore commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

joshmoore commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

dzenanz commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

dzenanz commented Aug 10, 2022

dzenanz commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

joshmoore commented Aug 11, 2022

DennisHeimbigner commented Aug 11, 2022 •

edited

Loading

DennisHeimbigner commented Aug 28, 2022

joshmoore commented Sep 5, 2022

dzenanz commented Sep 5, 2022

DennisHeimbigner commented Sep 5, 2022

joshmoore commented Sep 6, 2022

DennisHeimbigner commented Sep 6, 2022

joshmoore commented Sep 6, 2022 •

edited

Loading

dzenanz commented Sep 8, 2022

DennisHeimbigner commented Sep 8, 2022

dzenanz commented Sep 8, 2022

DennisHeimbigner commented Sep 8, 2022

dzenanz commented Sep 9, 2022

DennisHeimbigner commented Sep 9, 2022

dzenanz commented Sep 9, 2022

NCZarr does not support reading of many zarr files #2474

NCZarr does not support reading of many zarr files #2474

Comments

dzenanz commented Aug 8, 2022

dzenanz commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

dzenanz commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

joshmoore commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

joshmoore commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

dzenanz commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

dzenanz commented Aug 10, 2022

dzenanz commented Aug 10, 2022

DennisHeimbigner commented Aug 10, 2022

joshmoore commented Aug 11, 2022

DennisHeimbigner commented Aug 11, 2022 • edited Loading

DennisHeimbigner commented Aug 28, 2022

joshmoore commented Sep 5, 2022

dzenanz commented Sep 5, 2022

DennisHeimbigner commented Sep 5, 2022

joshmoore commented Sep 6, 2022

DennisHeimbigner commented Sep 6, 2022

joshmoore commented Sep 6, 2022 • edited Loading

dzenanz commented Sep 8, 2022

DennisHeimbigner commented Sep 8, 2022

dzenanz commented Sep 8, 2022

DennisHeimbigner commented Sep 8, 2022

dzenanz commented Sep 9, 2022

DennisHeimbigner commented Sep 9, 2022

dzenanz commented Sep 9, 2022

DennisHeimbigner commented Aug 11, 2022 •

edited

Loading

joshmoore commented Sep 6, 2022 •

edited

Loading