-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opening Zarr files with R #1982
Comments
Do you know by chance if the file is an xarray created file? |
Try to send me just the metadata from the file by doing something like this.
|
The dataset is create here [1], I think they are using xarray, it is definitely not nczarr. Metadata in [2] [1] https://github.com/esa-esdl/cube-generator [2] Edit: sorry, took me a second to understand, you get it as zip, github doesn't allow tar files. |
Did not expect nczarr :-) |
So a couple of questions. Second, In that file, the object esdc-8d-0.083deg-184x270x270-2.1.0.zarr/aerosol_optical_thickness_1600/.zattrs has this entry When you print out this attribute value the original software, what does it look like? The reason I ask is because I have no way to deal with an attribute whose value is a JSON dictionary and I am curious as to how it should look. |
The datasets are generated with this package, I think they use xarray internally: https://github.com/esa-esdl/esdl-core/ you can find the actual providers of the data here, there is also the metadata you are asking about: https://github.com/esa-esdl/esdl-core/tree/master/esdl/providers
I never used python to read these datasets in, I use Julia ESDL.jl [1] and have never seen these attributes printed. You can open the dataset using python: In [1]: import xarray as xr
In [2]: c = xr.open_zarr("/path/to/cube.zarr")
In [13]: c.aerosol_optical_thickness_1600
Out[13]:
<xarray.DataArray 'aerosol_optical_thickness_1600' (time: 1840, lat: 2160, lon: 4320)>
[17169408000 values with dtype=float64]
Coordinates:
* lat (lat) float64 89.96 89.88 89.79 89.71 ... -89.79 -89.87 -89.96
* lon (lon) float64 -180.0 -179.9 -179.8 -179.7 ... 179.8 179.9 180.0
* time (time) datetime64[ns] 1979-01-05 1979-01-13 ... 2018-12-31
Attributes: (12/13)
Conventions: CF-1.6
easting: -180.0 degrees
esa_cci_path: /neodc/esacci/aerosol/data/AATSR_SU/L3/v4.3/DAILY/
history: Thu May 7 16:43:23 2020 - ESDL data cube generation
institution: Brockmann Consult GmbH, Germany
northing: 90.0 degrees
... ...
source: ESDL data cube generation, version 0.3.0.dev1
source_attributes: {'comment': 'Aerosol optical thickness derived from...
time_coverage_end: 2012-04-10
time_coverage_start: 2002-05-21
units: 1
url: http://www.esa-aerosol-cci.org/
In [14]: c.aerosol_optical_thickness_1600.source_attributes
Out[14]: "{'comment': 'Aerosol optical thickness derived from the dataset produced by the Aerosol CCI project.', 'long_name': 'Aerosol Optical Thickness at 1600 nm', 'project_name': 'ESA Aerosol CCI', 'references': 'Holzer-Popp, T., de Leeuw, G., Griesfeller, J., Martynenko, D., Klueser, L., Bevan, S., et al. (2013). Aerosol retrieval experiments in the ESA Aerosol_cci project. Atmospheric Measurement Techniques, 6, 1919-1957. doi:10.5194/amt-6-1919-2013. ', 'source_name': 'AOD1600_mean', 'standard_name': 'atmosphere_optical_thickness_due_to_aerosol_at_1600nm', 'units': '1', 'url': 'http://www.esa-aerosol-cci.org/'}" it seems they just keep it as a string. [1] https://github.com/esa-esdl/esdl-core/tree/master/esdl/providers |
Is there a way you can get Julia to explicitly print that attribute? |
For whatever reason that particular variable does not show up if I read in the entire data set (https://github.com/esa-esdl/ESDL.jl/issues/248):
|
Ok, the current netcdf-c github master should solve this problem. |
ok, I think I have this solved. It turns out there was a bug in my JSON parser that |
If you want to fix the bug yourself;
|
re: github issue Unidata#1982 The problem was that the libnczarr/zsjon.c handling of strings with embedded double quotes was wrong; a one line fix. Also added a test case. Misc. other changes: 1. I Discovered, en passant, that the handling of 64 bit constants had an error that was fixed. 2. cleanup of the constant conversion code to recurse on arrays of values.
Fixed by PR #1993 |
Thanks working on this, now on latest master I get a segfault, when using Using ncdump on these files I get only a message "No such file or directory" > library(ncdf4)
> nc_open("file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,file")
Error in R_nc4_open: NetCDF: Attempt to read empty NCZarr map key
Error in nc_open("file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,file") :
Error in nc_open trying to open file file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,file
> nc_open("file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,s3")
Error in R_nc4_open: NetCDF: Attempt to use feature that was not turned on when netCDF was built.
Error in nc_open("file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,s3") :
Error in nc_open trying to open file file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,s3
> nc_open("file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,zarr")
*** caught segfault ***
address 0xc49109f38, cause 'memory not mapped'
Traceback:
1: nc_open("file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,zarr") $ ncdump "file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,zarr"
ncdump: file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,zarr: file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,zarr: No such file or directory
$ ncdump "file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,file"
ncdump: file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,file: file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,file: No such file or directory
$ ncdump "file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,s3"
ncdump: file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,s3: file:///home/gkraemer/data/DataCube/v2.1.0/esdc-8d-0.083deg-184x270x270-2.1.0.zarr#mode=nczarr,s3: No such file or directory |
Unfortunately, I cannot duplicate this failure. I will have to think about how to resolve it. |
I have build
netcdf-c
v4.8.0 on Manjaro and then thencdf4
R package. And I get the following error:Sorry that this is not more reproducible the dataset is so large that I cannot put it online and so far I could not build netcdf-c with the s3 support.
The text was updated successfully, but these errors were encountered: