Skip to content

improve speed of cmems ini file generation #1207

@veenstrajelmer

Description

@veenstrajelmer

Generating an CMEMS inifile is very slow if the source folder contains 1000+ netcdf files. This is due to xr.open_mfdataset().

Reproducible code (the code is way faster when adding the year to the pattern (resulting in only 25% of the files):

import os
import dfm_tools as dfmt
from dfm_tools.modelbuilder import get_ncvarname
import xarray as xr

# user input
model_name = 'DCSM-FM' # the name cannot contain a space
date_min = '2012-01-01'
dir_output_data_cmems = r'p:/11211535-004-dcsm-fm/data/CMEMS/'

# convert downloaded CMEMS data to initial fields
# dir_pattern = os.path.join(dir_output_data_cmems,'cmems_{ncvarname}_' f'{date_min[0:4]}*.nc')
dir_pattern = os.path.join(dir_output_data_cmems,'cmems_{ncvarname}_*.nc')

xr_kwargs = {"join":"exact", "data_vars":"minimal"}

conversion_dict = dfmt.get_conversion_dict()

quan_bnd = "salinitybnd"
ncvarname = get_ncvarname(
    quantity=quan_bnd,
    conversion_dict=conversion_dict,
    )
dir_pattern_one = dir_pattern.format(ncvarname=ncvarname)

data_xr = xr.open_mfdataset(dir_pattern_one, **xr_kwargs)

Todo:

  • check if the performance can be improved with additional arguments: parallel=True, coords="minimal", compat="equals" >> not much effect
  • Related xarray issue (7 years old but still active): slow performance with open_mfdataset pydata/xarray#1385 >> fixed in New defaults for concat, merge, combine_* pydata/xarray#10062 but will take some time for the new defaults to be set automatically for everyone (without explicitly opting in) >> does not increase the speed also, it is just way too much files.
  • alternatively, allow user to provide hardcoded list of (selection of) files, this will improve results dramatically
  • update whatsnew

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions