-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using xr.to_netcdf after xr.concat introduces max/min constraints based on first file #8135
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
@MRibberink Thanks for the report. Most likely issue is different cf-coding attributes on that variable (maybe packed-data, You can check this with: print(Lor1.encoding, Lor1["msl"].encoding)
print(Lor2.encoding, Lor2["msl"].encoding) During concatenating only the encoding-dict of the first object survives (as you already assessed), leading to a wrong/different encoding into netcdf. If different cf-coding attributes is the issue, you have several options:
|
It looks like my first guess was correct, we have packed data for the short msl(time, latitude, longitude) ;
msl:scale_factor = 0.0755888254772405 ;
msl:add_offset = 100995.993455587 ;
msl:_FillValue = -32767s ;
msl:missing_value = -32767s ;
msl:units = "Pa" ;
msl:long_name = "Mean sea level pressure" ;
msl:standard_name = "air_pressure_at_mean_sea_level" ; |
To answer your question: Adding Please see also the discussion on future of encoding-dict here: #6323 |
Wonderful. Thank you so much! |
@MRibberink You are very welcome. So assuming you've got the answers you need, we can close this. Please do not hesitate to open follow-up issues whenever necessary. |
What happened?
I'm using xarray to concatenate several netcdf files in time (ERA5 mean sea level data that needs to be downloaded in separate months), but because I have a lot of them I save the concatenated files to netcdf after using them so I can later do my data analysis in a loop rather than each by hand. Doing so unfortunately seems to set a min/max on the whole timeseries based on the first file, though this doesn't exist when the data is plotted before being processed by to_netcdf.
Data can be found here:
https://www.dropbox.com/sh/xb1c2018ey83poc/AABDpQnjbWcGdtq_MwJeEEc-a?dl=0
The code below generates several diagnostic images, primary among them showing the problem is this one:
Where the blue line is just the concatenated original datasets, and the orange line is that data saved by to_netcdf and then called up again. The first file ends at the end of August.
The same problem causes weird phenomena when looking at the sea level pressure fields, as shown in the other created plots. The first is how it should look (just concatted together), but the second is what it actually looks like (to_netcdf). Here's how they look on my end:
I also included a cross-section through the phenomenon off the coast of Canada to show what it's doing when it encounters values below the minimum - it brings them so the first value below is at the level of the maximum defined earlier:
I found a weird workaround, the problem is fixed by adding 0 to the dataset before sending it to to_netcdf, but this still shouldn't be happening (and I have NO idea why that fixes it).
What did you expect to happen?
I expected to_netcdf to accurately save the dataset the way it is, rather than setting a max/min based on the first file. In essence, to reproduce the minimum pressure line that was plotted before saving and reloading the data.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:14:58) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 11, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_Canada', '1252')
libhdf5: 1.14.1
libnetcdf: 4.9.2
xarray: 2023.6.0
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.0
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.3
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.6.0
distributed: 2023.6.0
matplotlib: 3.7.0
cartopy: 0.21.1
seaborn: None
numbagg: None
fsspec: 2023.4.0
cupy: None
pint: 0.22
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.0
conda: None
pytest: 7.4.0
mypy: None
IPython: 8.12.2
sphinx: 4.4.0
The text was updated successfully, but these errors were encountered: