-
Notifications
You must be signed in to change notification settings - Fork 18
Closed
Description
Generating an CMEMS inifile is very slow if the source folder contains 1000+ netcdf files. This is due to xr.open_mfdataset().
Reproducible code (the code is way faster when adding the year to the pattern (resulting in only 25% of the files):
import os
import dfm_tools as dfmt
from dfm_tools.modelbuilder import get_ncvarname
import xarray as xr
# user input
model_name = 'DCSM-FM' # the name cannot contain a space
date_min = '2012-01-01'
dir_output_data_cmems = r'p:/11211535-004-dcsm-fm/data/CMEMS/'
# convert downloaded CMEMS data to initial fields
# dir_pattern = os.path.join(dir_output_data_cmems,'cmems_{ncvarname}_' f'{date_min[0:4]}*.nc')
dir_pattern = os.path.join(dir_output_data_cmems,'cmems_{ncvarname}_*.nc')
xr_kwargs = {"join":"exact", "data_vars":"minimal"}
conversion_dict = dfmt.get_conversion_dict()
quan_bnd = "salinitybnd"
ncvarname = get_ncvarname(
quantity=quan_bnd,
conversion_dict=conversion_dict,
)
dir_pattern_one = dir_pattern.format(ncvarname=ncvarname)
data_xr = xr.open_mfdataset(dir_pattern_one, **xr_kwargs)Todo:
- check if the performance can be improved with additional arguments:
parallel=True, coords="minimal", compat="equals">> not much effect - Related xarray issue (7 years old but still active): slow performance with open_mfdataset pydata/xarray#1385 >> fixed in New defaults for
concat,merge,combine_*pydata/xarray#10062 but will take some time for the new defaults to be set automatically for everyone (without explicitly opting in) >> does not increase the speed also, it is just way too much files. - alternatively, allow user to provide hardcoded list of (selection of) files, this will improve results dramatically
- update whatsnew
Metadata
Metadata
Assignees
Labels
No labels