scale_factor, add_offset and _FillValue #39

Alexander-Barth · 2016-12-10T21:32:35Z

In Python and Matlab/Octave, the NetCDF variables scale_factor and add_offset are applied when these attributes are defined, according to this simple rule:

variable2 = scale_factor * variable + add_offset

As far as I can see, this is also the default for the R package RNetCDF. Is it possible to add this behaviour to ncread and to getindex/setindex! functions? (I actually prefer the later API because indexing is much more natural).

In a similar vein, is it also possible to set values to NaN which are equal to _FillValue or maybe use DataArrays?

meggart · 2016-12-16T13:29:11Z

Yes, this should have been done a while ago. The scale_factor part should be quite simple. The other part would need a good API, maybe a keyword argument would be sufficient like: ncread(...,missingas=:dataarray) or ncread(...,missingas=:NaN)

I hope to find some time during the Christmas break to do this.

Alexander-Barth · 2016-12-16T20:40:06Z

Yes, this would be very useful.
For the lower-level API, might one could have something like this:

nc = NetCDF.open("some_file.nc")
regular_array = nc.vars["variable_name"][:,:]; # as implemented now
instance_of_data_array = nc.data["variable_name"][:,:]; # data array with scaling

If this API looks fine, I can try to implement it and make a pull request (if I succeed).

visr · 2016-12-18T19:03:05Z

Just wanted to note that rather than using DataArray, we probably should use NullableArray instead. DataFrames master has already switched over, NullableArray should work better with Julia's type system.

acrosby · 2017-03-09T23:28:10Z

Has there been any progress on this @Alexander-Barth ? Is it something that I could easily submit a PR for, where would you want it to go? @meggart

meggart · 2017-03-19T20:26:45Z

I started doing this and then it turned out to be more difficult than I thought to do properly. The main problem is that currently NcVars have type parameters like Julia arrays, which means that e.g. reading from an NcVar{Float32,3} returns an Array{Float32,3} and methods like readvar! and ncread! can work in-place.

When we have a scale factor and offset, it is common that the return type is not the same as the NetCDF storage type, for example a variable might be NC_UBYTE on disk with a scale_factor of 3.f0, so the data has to be converted to floating-point when reading.

This all got very confusing and I was not really happy with the resulting interface, so I stopped working on it until I have some time to think about it. Maybe you have some good ideas how to do this, here are my attempts so far: 1713963

Another question would be how to deprecate the old behaviour. I think a change like this will inevitably break lots of existing code.

Alexander-Barth · 2017-05-22T11:55:20Z

Indeed, it is not easy to implement this efficiently. Maybe one can implement two sets of types (for NetCDF variable (NcVarRaw, NcVarScaled) and NetCDF files (NcFileRaw, NcFileScaled). Indexing a NcFileRaw type would produce a NcVarRaw and indexing a NcVarRaw would return the same type as in the NetCDF files (ignoring all scaling and _FillValues).

However, indexing NcFileScaled (or a better name) and NcVarScaled would return always a DataArray/NullableArray of Float64 for numbers (with scaling and masking _FillValues) and a DataArray/NullableArray of chars for characters (ignoring the scaling but masking _FillValues).

squaregoldfish · 2017-12-03T19:39:03Z

I'm completely new to Julia, but I work with netCDFs a lot so I need this functionality if I'm going to continue. Has there been any more progress on this? Otherwise I'm willing to have an attempt at implementing something.

Alexander-Barth · 2017-12-03T20:22:01Z

In case, you are interested. Here is my attempt: https://github.com/Alexander-Barth/NCDatasets.jl

meggart · 2017-12-04T09:20:03Z

@Alexander-Barth I have looked at your new interface, it looks good, and I think I will try it soon. Do you think it would make sense to factor out the netcdf_c.jl and build.jl into its own package (NetCDF-C) so that you could depend on this one? Then it would be easier to experiment with new interfaces for reading and writing NetCDF files, while minimizing code duplication.

Alexander-Barth · 2017-12-05T13:40:37Z

In fact, I started to change netcdf_c.jl quite bit and I am not sure if you are interested in these changes. I think that this file is an automatically generated file (right?). The changes I made are mainly returning e.g. scalar values instead of changing the elements of a one element vector. I also changed some parameter types from Cint to Csize_t if in the corresponding C API size_t is used.

acrosby · 2017-12-06T19:25:09Z

@Alexander-Barth Interested in trying this out.

meggart · 2017-12-06T19:29:19Z

Ok, I think then it is better to leave the auto-generated file as it is. In general, I think it would be good if you keep posting progress on your new interface, especially once you register it as a package, then we could mention it as an alternative in the README.

Alexander-Barth · 2017-12-07T09:21:01Z

Ok, I agree. The other aspect (similar to scaling) that I have implemented is returning time variables (if they have a unit attribute of the form "days/hours/.. since start date) as arrays of DateTime:
https://github.com/Alexander-Barth/NCDatasets.jl/blob/master/src/NCDatasets.jl#L146

In fact, I recently registered NCDatasets. I added the following to its README file:

Alexander-Barth/NCDatasets.jl@daeda94
I hope that this is OK for you.

visr · 2017-12-07T10:30:30Z

@Alexander-Barth thanks for showing NCDatasets, it wasn't on my radar before. We should indeed link to it in our readme as well.

Regarding netcdf_c.jl, it is currently auto generated, followed by several manual steps, which is is briefly documented in wrap.jl. Ideally this becomes fully automatic, such that updating to a newer NetCDF version is easy as possible, but it requires some work on both wrap.jl and Clang.jl. Manual edits are much quicker to get something that works though.

In my opinion we should aim towards either:

Having NCDatasets.jl depend on NetCDF.jl, to reduce double effort, and NCDatasets can use the NetCDF low level API to build any more convenient API. This is more or less what is the idea behind GDAL.jl and ArchGDAL.jl, where GDAL.jl is pretty much a 1 to 1 translation of the GDAL API to Julia, on which ArchGDAL.jl can build to make a friendlier API. The advantage is that if someone has a need for a different API, they can relatively easily build a new API on top of GDAL.jl, without having to do the (not very interesting) plumbing.

Try to merge the packages together? I understand that starting a separate package can be a great way to quickly work out new ideas, so I'm happy that you did that, and we can learn from each other. That said, I'm not sure there are any fundamental differences in where we want to go.

What are your thoughts? @meggart @Alexander-Barth

yeesian · 2017-12-07T13:40:47Z

Another motivating factor for having one of the packages depend on the other is to have a single binary (and build file) shared across both packages

Alexander-Barth · 2017-12-08T09:22:45Z

Initially, I considered indeed to depend NCDatasets.jl on NetCDF.jl but I did not choose to follow to road:

the interface in netcdf_c.jl is a bit inconveniet to use e.g.

ncidp = Vector{Cint}(1)
nc_create(path,cmode,ncidp)
# do something with ncidp[1]

While I wanted to use simply ncid = nc_create(path,cmode).

Your intermediate interface (which is indeed defines nc_create the way I want to call it) makes extensive use of global variables. I try to avoid global variables (essentially to make the code easier to maintain and to avoid potential thread-safety issues).
Way too often I am in the situation where I want to install software A, but which requires software B, but which require software C, ... (not just for julia package but in general). And all of them have different requirements and which makes it very hard for the user to install them sometimes. The benefit of the dependency must be quite high compared to cost of the potential maintenance headaches in the future. Given the simplicity of the function like nc_create I do not think that the benefit is large enough.
The library libnetcdf.so is installed for both packages with apt-get or conda and it is thus shared between both packages.

About the merging option:

NetCDF.jl has already 3 interfaces (high-level, medium-level and the low-level wrapper). Adding a 4th interface would be quite confusing.
There is the risk that the user combines the different interfaces which be probably not work.

Of course I see also the overall benefit of combining all forces in a single NetCDF package, but I did not see how I could change NetCDF.jl the way I wanted it to be without massively breaking the user's current code.

meggart · 2017-12-18T08:49:27Z

Thanks for the clarification and sorry for the long response time. A general comment first: Writing this package was my first attempt at learning Julia in 2012, so there are indeed many ways to improve the code, and maybe a complete re-write as you started would make sense.

Regarding the globals, I see the problem when it comes to threading, the main reason they were introduced was to avoid unnecessary memory allocations.
I don't have any objections to reverting this, I think this was over-optimizing and we should, now that Julia has threads put thread safety over saving allocations.

I am still not completely convinced of your dependency argument, since the NetCDF-C.jl package would be a very small package without any other Julia dependencies, containing just a single netcdf-c.jl source file.

I support the idea of having separate packages for imperative reading and writing data vs a Object-Dataset like approach, and definitely think you should go on and register your package as an alternative.

meggart mentioned this issue Jul 3, 2018

WIP: Julia 0.7, remove globals, missing value, scale_offset #61

Closed

6 tasks

visr mentioned this issue Nov 18, 2018

Julia NetCDF getting negative values from nc file #75

Open

This was referenced Feb 28, 2019

read_output function for NetCDF. CliMA/Oceananigans.jl#93

Merged

Better way of writing data to disk? CliMA/Oceananigans.jl#31

Closed

ali-ramadhan mentioned this issue Mar 21, 2019

Faster and more flexible NetCDF IO. CliMA/Oceananigans.jl#145

Closed

4 tasks

visr mentioned this issue Nov 2, 2019

different result between ncread and matlab‘s ncread #103

Closed

visr mentioned this issue Nov 29, 2019

NetCDF reads values from file wrongly. #107

Closed

visr mentioned this issue Dec 4, 2020

Scale & offset factor not considered #131

Closed

visr mentioned this issue Oct 7, 2022

Variable's scale_factor and add_offset are not taken into account during reading #165

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scale_factor, add_offset and _FillValue #39

scale_factor, add_offset and _FillValue #39

Alexander-Barth commented Dec 10, 2016

meggart commented Dec 16, 2016

Alexander-Barth commented Dec 16, 2016

visr commented Dec 18, 2016

acrosby commented Mar 9, 2017 •

edited

Loading

meggart commented Mar 19, 2017

Alexander-Barth commented May 22, 2017

squaregoldfish commented Dec 3, 2017

Alexander-Barth commented Dec 3, 2017

meggart commented Dec 4, 2017

Alexander-Barth commented Dec 5, 2017

acrosby commented Dec 6, 2017

meggart commented Dec 6, 2017

Alexander-Barth commented Dec 7, 2017

visr commented Dec 7, 2017

yeesian commented Dec 7, 2017

Alexander-Barth commented Dec 8, 2017

meggart commented Dec 18, 2017

scale_factor, add_offset and _FillValue #39

scale_factor, add_offset and _FillValue #39

Comments

Alexander-Barth commented Dec 10, 2016

meggart commented Dec 16, 2016

Alexander-Barth commented Dec 16, 2016

visr commented Dec 18, 2016

acrosby commented Mar 9, 2017 • edited Loading

meggart commented Mar 19, 2017

Alexander-Barth commented May 22, 2017

squaregoldfish commented Dec 3, 2017

Alexander-Barth commented Dec 3, 2017

meggart commented Dec 4, 2017

Alexander-Barth commented Dec 5, 2017

acrosby commented Dec 6, 2017

meggart commented Dec 6, 2017

Alexander-Barth commented Dec 7, 2017

visr commented Dec 7, 2017

yeesian commented Dec 7, 2017

Alexander-Barth commented Dec 8, 2017

meggart commented Dec 18, 2017

acrosby commented Mar 9, 2017 •

edited

Loading