Skip to content

Xarray operations (e.g., preprocess) running locally (post open_mfdataset) instead of on Dask distributed cluster #8913

Answered by lbesnard
lbesnard asked this question in General
Discussion options

You must be logged in to vote

I got some help from Coiled, thanks @phofl

For reference, It turns out that the issue is related to s3fs and already lodge here:
fsspec/filesystem_spec#1747

The solution is to use this obscure option in s3fs:
default_file_cache=None

It sounds a bit insane that not many people are experiencing this issue as this means using a dask cluster with remote NetCDF files is useless as the bottleneck becomes the machine which is starting the code

Replies: 5 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@lbesnard
Comment options

Comment options

You must be logged in to vote
2 replies
@lbesnard
Comment options

Answer selected by lbesnard
@phofl
Comment options

phofl Nov 11, 2024
Collaborator

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants