Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_mfdataset exits while sending a "Segmentation fault" error #8120

Closed
kasra-keshavarz opened this issue Aug 28, 2023 · 4 comments
Closed
Labels
needs triage Issue that has not been reviewed by xarray team member

Comments

@kasra-keshavarz
Copy link

What is your issue?

I try to open about ~10 files, each 5MB as a test case, using xarray's open_mfdataset method with the parallel=True option, however, it throws a "Segmentation fault" error as the following:

$ ipython
Python 3.10.2 (main, Feb  4 2022, 19:10:35) [GCC 9.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr

In [2]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10})

In [3]: ds
Out[3]: 
<xarray.Dataset>
Dimensions:              (time: 744, rlat: 140, rlon: 105)
Coordinates:
  * time                 (time) datetime64[ns] 1980-01-01T13:00:00 ... 1980-0...
    lon                  (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray>
    lat                  (rlat, rlon) float32 dask.array<chunksize=(140, 105), meta=np.ndarray>
  * rlon                 (rlon) float64 342.1 342.2 342.2 ... 351.2 351.3 351.4
  * rlat                 (rlat) float64 -7.83 -7.74 -7.65 ... 4.5 4.59 4.68
Data variables:
    rotated_pole         (time) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
    RDRS_v2.1_P_UVC_10m  (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
    RDRS_v2.1_P_FI_SFC   (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
    RDRS_v2.1_P_FB_SFC   (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
    RDRS_v2.1_A_PR0_SFC  (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
    RDRS_v2.1_P_P0_SFC   (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
    RDRS_v2.1_P_TT_1.5m  (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
    RDRS_v2.1_P_HU_1.5m  (time, rlat, rlon) float32 dask.array<chunksize=(10, 140, 105), meta=np.ndarray>
Attributes:
    CDI:          Climate Data Interface version 2.0.4 (https://mpimet.mpg.de...
    Conventions:  CF-1.6
    product:      RDRS_v2.1
    Remarks:      Variable names are following the convention <Product>_<Type...
    License:      These data are provided by the Canadian Surface Prediction ...
    history:      Mon Aug 28 13:44:02 2023: cdo -z zip -s -L -sellonlatbox,-1...
    NCO:          netCDF Operators version 5.0.6 (Homepage = http://nco.sf.ne...
    CDO:          Climate Data Operators version 2.0.4 (https://mpimet.mpg.de...

In [4]: type(ds)
Out[4]: xarray.core.dataset.Dataset

In [5]: ds = xr.open_mfdataset('./ab_models_198001*.nc', chunks={'time':10}, parallel=True)
[gra-login3:25527:0:6913] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
[gra-login3:25527] *** Process received signal ***
[gra-login3:25527] Signal: Segmentation fault (11)
[gra-login3:25527] Signal code:  (128)
[gra-login3:25527] Failing at address: (nil)
Segmentation fault

Here is the version of xarray:

In [5]: xr.show_versions()
/home/user/virtual-envs/scienv/lib/python3.10/site-packages/_distutils_hack/__init__.py:36: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.2 (main, Feb  4 2022, 19:10:35) [GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.88.1.el7.x86_64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: ('en_CA', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.9.0

xarray: 2023.7.0
pandas: 1.4.0
numpy: 1.21.2
scipy: 1.8.0
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.8.0
distributed: 2023.8.0
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 60.2.0
pip: 23.2.1
conda: None
pytest: 7.4.0
mypy: None
IPython: 8.10.0
sphinx: None

I'm working on an HPC, so if a list "modules" I have loaded helps, here it is:

$ module list

Currently Loaded Modules:
  1) CCconfig          5) gcccore/.9.3.0  (H)      9) libfabric/1.10.1      13) ipykernel/2023a           17) sqlite/3.38.5                   21) postgresql/12.4             (t)  25) gdal/3.5.1  (geo)  29) udunits/2.2.28 (t)    33) cdo/2.2.1             (geo)
  2) gentoo/2020 (S)   6) imkl/2020.1.217 (math)  10) openmpi/4.0.3    (m)  14) scipy-stack/2023a (math)  18) jasper/2.0.16            (vis)  22) freexl/1.0.5                (t)  26) geos/3.10.2 (geo)  30) libaec/1.0.6          34) mpi4py/3.1.3          (t)
  3) StdEnv/2020 (S)   7) gcc/9.3.0       (t)     11) libffi/3.3            15) hdf5/1.10.6       (io)    19) libgeotiff-proj901/1.7.1        23) librttopo-proj9/1.1.0            27) proj/9.0.1  (geo)  31) eccodes/2.25.0 (geo)  35) netcdf-fortran/4.5.2  (io)
  4) mii/1.1.2         8) ucx/1.8.0               12) python/3.10.2    (t)  16) netcdf/4.7.4      (io)    20) cfitsio/4.1.0            (vis)  24) libspatialite-proj901/5.0.1      28) expat/2.4.1 (t)    32) yaxt/0.9.0     (t)    36) libspatialindex/1.8.5 (phys)

  Where:
   S:     Module is Sticky, requires --force to unload or purge
   m:     MPI implementations / Implémentations MPI
   math:  Mathematical libraries / Bibliothèques mathématiques
   io:    Input/output software / Logiciel d'écriture/lecture
   t:     Tools for development / Outils de développement
   vis:   Visualisation software / Logiciels de visualisation
   geo:   Geography libraries/apps / Logiciels de géographie
   phys:  Physics libraries/apps / Logiciels de physique
   H:                Hidden Module

Thanks.

@kasra-keshavarz kasra-keshavarz added the needs triage Issue that has not been reviewed by xarray team member label Aug 28, 2023
@welcome
Copy link

welcome bot commented Aug 28, 2023

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@meteoDaniel
Copy link

Hey @kasra-keshavarz , I think your issue has to do with the fact that you are running parallel mode on a HPC. So it could be possible that the underlying dask.delayed is unable to utilize the hpc cores.

Can you check out dask.delayed on your machine ?

@keewis
Copy link
Collaborator

keewis commented Sep 1, 2023

I think you're running into the issues described in #7079. You have two options for now: pin netcdf4<1.6.0 or don't use a threading scheduler (see the dask documentation on how to do that). It is truly unfortunate that we couldn't fix this up until now, but this is also very tricky to debug since it involves race conditions (which I can reliably reproduce locally, but ...)

@kasra-keshavarz
Copy link
Author

Thank you both @meteoDaniel and @keewis, much appreciated.

I was wondering if the issue is related to Dask and it is causing this issue. Just a quick suggestions; I understand it is a completely separate issue from xarray and it is related to Dask, but since there is no meaningful error or warning being issue to the user, it was difficult for me to dig into the problem. If this could be mentioned in any of your documentations, it could be a great help to the next person facing the same issue. I close the issue here. Thank you again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage Issue that has not been reviewed by xarray team member
Projects
None yet
Development

No branches or pull requests

3 participants