Using xr.to_netcdf after xr.concat introduces max/min constraints based on first file #8135

MRibberink · 2023-09-01T07:48:19Z

What happened?

I'm using xarray to concatenate several netcdf files in time (ERA5 mean sea level data that needs to be downloaded in separate months), but because I have a lot of them I save the concatenated files to netcdf after using them so I can later do my data analysis in a loop rather than each by hand. Doing so unfortunately seems to set a min/max on the whole timeseries based on the first file, though this doesn't exist when the data is plotted before being processed by to_netcdf.

Data can be found here:
https://www.dropbox.com/sh/xb1c2018ey83poc/AABDpQnjbWcGdtq_MwJeEEc-a?dl=0

The code below generates several diagnostic images, primary among them showing the problem is this one:

Where the blue line is just the concatenated original datasets, and the orange line is that data saved by to_netcdf and then called up again. The first file ends at the end of August.

The same problem causes weird phenomena when looking at the sea level pressure fields, as shown in the other created plots. The first is how it should look (just concatted together), but the second is what it actually looks like (to_netcdf). Here's how they look on my end:

I also included a cross-section through the phenomenon off the coast of Canada to show what it's doing when it encounters values below the minimum - it brings them so the first value below is at the level of the maximum defined earlier:

I found a weird workaround, the problem is fixed by adding 0 to the dataset before sending it to to_netcdf, but this still shouldn't be happening (and I have NO idea why that fixes it).

What did you expect to happen?

I expected to_netcdf to accurately save the dataset the way it is, rather than setting a max/min based on the first file. In essence, to reproduce the minimum pressure line that was plotted before saving and reloading the data.

Minimal Complete Verifiable Example

import xarray as xr
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
crs = ccrs.PlateCarree()

#Import and concat data
Lor1=xr.open_dataset('data_62_2022.nc')
Lor2=xr.open_dataset('data_63_2022.nc')

Lor3=xr.concat([Lor1,Lor2],dim='time')

#adding this line is a workaround and makes it function for some reason
#Lor3['msl']=Lor3['msl']+0.0

#Save and reimport data
Lor3.to_netcdf("test/lor_4.nc")
Lor4=xr.open_dataset('test/lor_4.nc')

#Diagnostic plots

#minimum pressure
fig,ax=plt.subplots()
Lor3.msl.min("longitude").min('latitude').plot(label='concat')
Lor4.msl.min("longitude").min('latitude').plot(label='to_netcdf')
plt.legend()
plt.title("Minimum pressure")

#mslp field concat (what it should look like)
fig1=plt.figure(dpi=150)
ax1 = plt.axes(projection=crs, frameon=True)
ax1.coastlines(resolution='50m')
Lor3.msl.isel(time=250).plot(ax=ax1)
ax1.set_title("Concat")

#mslp field to_netcdf (what it actually looks like)
fig2=plt.figure(dpi=150)
ax2 = plt.axes(projection=crs, frameon=True)
ax2.coastlines(resolution='50m')
Lor4.msl.isel(time=250).plot(ax=ax2)
ax2.set_title("to_netcdf")

#cross section mslp field
fig3,ax3=plt.subplots()
Lor3.msl.isel(time=250).sel(latitude=42).plot(label='concat')
Lor4.msl.isel(time=250).sel(latitude=42).plot(ls='--',label='to_netcdf')
ax3.axhline(Lor1.msl.max(),color='k',ls='--',label='max file 1')
ax3.axhline(Lor1.msl.min(),color='k',ls=':',label='min file 1')
plt.legend()

MVCE confirmation

Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:14:58) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 11, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_Canada', '1252')
libhdf5: 1.14.1
libnetcdf: 4.9.2

xarray: 2023.6.0
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.0
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.13.3
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.6.0
distributed: 2023.6.0
matplotlib: 3.7.0
cartopy: 0.21.1
seaborn: None
numbagg: None
fsspec: 2023.4.0
cupy: None
pint: 0.22
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.0
conda: None
pytest: 7.4.0
mypy: None
IPython: 8.12.2
sphinx: 4.4.0

welcome · 2023-09-01T07:48:23Z

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

kmuehlbauer · 2023-09-01T08:04:24Z

@MRibberink Thanks for the report.

Most likely issue is different cf-coding attributes on that variable (maybe packed-data, scale_factor/add_offset).

You can check this with:

print(Lor1.encoding, Lor1["msl"].encoding)
print(Lor2.encoding, Lor2["msl"].encoding)

During concatenating only the encoding-dict of the first object survives (as you already assessed), leading to a wrong/different encoding into netcdf.

If different cf-coding attributes is the issue, you have several options:

adapt encoding to fit your data
remove encoding entirely and write float to disk (https://docs.xarray.dev/en/stable/generated/xarray.DataArray.reset_encoding.html)
override encoding-kwarg in to_netcdf for the variable in question.

kmuehlbauer · 2023-09-01T08:13:13Z

It looks like my first guess was correct, we have packed data for the msl variable. Thanks for providing the data @MRibberink.

short msl(time, latitude, longitude) ;
    msl:scale_factor = 0.0755888254772405 ;
    msl:add_offset = 100995.993455587 ;
    msl:_FillValue = -32767s ;
    msl:missing_value = -32767s ;
    msl:units = "Pa" ;
    msl:long_name = "Mean sea level pressure" ;
    msl:standard_name = "air_pressure_at_mean_sea_level" ;

kmuehlbauer · 2023-09-01T08:23:22Z

To answer your question: Adding 0.0 to the variable turns it into float and the encoding-dict will be cleared. This will in turn write floating point data to netcdf, which might result in a somewhat larger file.

Please see also the discussion on future of encoding-dict here: #6323

MRibberink · 2023-09-01T08:24:19Z

Wonderful. Thank you so much!

kmuehlbauer · 2023-09-01T08:31:26Z

@MRibberink You are very welcome. So assuming you've got the answers you need, we can close this. Please do not hesitate to open follow-up issues whenever necessary.

MRibberink added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 1, 2023

kmuehlbauer removed the needs triage Issue that has not been reviewed by xarray team member label Sep 1, 2023

kmuehlbauer added topic-CF conventions and removed bug labels Sep 1, 2023

kmuehlbauer closed this as completed Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using xr.to_netcdf after xr.concat introduces max/min constraints based on first file #8135

Using xr.to_netcdf after xr.concat introduces max/min constraints based on first file #8135

MRibberink commented Sep 1, 2023

INSTALLED VERSIONS

welcome bot commented Sep 1, 2023

kmuehlbauer commented Sep 1, 2023

kmuehlbauer commented Sep 1, 2023

kmuehlbauer commented Sep 1, 2023

MRibberink commented Sep 1, 2023

kmuehlbauer commented Sep 1, 2023

Using xr.to_netcdf after xr.concat introduces max/min constraints based on first file #8135

Using xr.to_netcdf after xr.concat introduces max/min constraints based on first file #8135

Comments

MRibberink commented Sep 1, 2023

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS

welcome bot commented Sep 1, 2023

kmuehlbauer commented Sep 1, 2023

kmuehlbauer commented Sep 1, 2023

kmuehlbauer commented Sep 1, 2023

MRibberink commented Sep 1, 2023

kmuehlbauer commented Sep 1, 2023