Skip to content

autoclose with distributed doesn't seem to work #1394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rabernat opened this issue May 2, 2017 · 9 comments
Closed

autoclose with distributed doesn't seem to work #1394

rabernat opened this issue May 2, 2017 · 9 comments

Comments

@rabernat
Copy link
Contributor

rabernat commented May 2, 2017

I am trying to analyze a very large netCDF dataset using xarray and distributed.

I open my dataset with the new autoclose option:

ds = xr.open_mfdataset(ddir + '*.nc', decode_cf=False, autoclose=True)

However, when I try some reduction operation (e.g. ds['Salt'].mean()), I can see my open file count continue to rise monotonically. Eventually the dask worker dies with OSError: [Errno 24] Too many open files: '/proc/65644/sta once I hit the system ulimit.

Am I doing something wrong here? Why are the files not being closed? cc: @pwolfram

@shoyer
Copy link
Member

shoyer commented May 2, 2017

Just to make sure, which version of xarray are you using?

@rabernat
Copy link
Contributor Author

rabernat commented May 2, 2017

0.9.3

@shoyer
Copy link
Member

shoyer commented May 2, 2017

0.9.3

OK, so that shouldn't be a problem. Hmm.

My only suggestion is that we should think about trying to write a fuller test suite for the auto-close functionality, using a mock or fake of some sort that we can interrogate to verify it works properly. One simple thing would be to refactor the autoclose functionality into a single separate adaptor datastore (which wraps an underlying datastore) that we can more easily test, rather than putting it onto each of the underlying datastore classes. I'm not sure why I didn't think of that when @pwolfram was writing this before.

@rabernat
Copy link
Contributor Author

rabernat commented May 2, 2017

The idea of a fuller test suite is a good idea.

One problem is that many of these applications involve really big datasets, so it is hard to share examples.

@pwolfram
Copy link
Contributor

pwolfram commented May 2, 2017

@rabernat, I would say that this is a bug. Is this with the scipy backend or netCDF4? Presumably if you have this problem we could run into too. For the record, we are using netCDF4.

@pwolfram
Copy link
Contributor

pwolfram commented May 2, 2017

Note, we don't use decode_cf=False. Does it crash without making this specification, e.g., using the default?

@rabernat
Copy link
Contributor Author

rabernat commented May 2, 2017

netCDF4. decode_cf doesn't seem to affect anything important.

@rabernat
Copy link
Contributor Author

rabernat commented May 2, 2017

I think that there is an underlying problem with the way that open_mfdataset is building the dask graph for this dataset (see #1396). Operations seem overly eager to read all the data and load it into memory. So it might not be a problem with autoclose after all.

I do notice that autoclose does work in certain cases. For example, after I open the dataset, it doesn't leave the files open. That's good.

@jhamman
Copy link
Member

jhamman commented Jan 13, 2019

Closing this old issue. I'm assuming this behavior no longer exists following the backend refactors in 2018. @rabernat (or others) please reopen if you feel there is more to do here.

@jhamman jhamman closed this as completed Jan 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants