Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray.backends refactor #2261

Merged
merged 45 commits into from
Oct 9, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
4faaf3a
WIP: xarray.backends.file_manager for managing file objects.
shoyer Jul 1, 2018
c82a38c
Switch rasterio to use FileManager
shoyer Jul 1, 2018
7a55a30
lint fixes
shoyer Jul 4, 2018
51463dd
WIP: rewrite FileManager to always use an LRUCache
shoyer Jul 9, 2018
23e132f
Test coverage
shoyer Jul 10, 2018
8fc8183
Don't use move_to_end
shoyer Jul 10, 2018
422944b
minor clarification
shoyer Jul 10, 2018
aea0a1a
Switch FileManager.acquire() to a method
shoyer Jul 11, 2018
4366c0b
Python 2 compat
shoyer Jul 11, 2018
f35b7e7
Update xarray.set_options() to add file_cache_maxsize and validation
shoyer Jul 11, 2018
057cad2
Add assert for FILE_CACHE.maxsize
shoyer Jul 11, 2018
0f3e656
More docstring for FileManager
shoyer Jul 11, 2018
1a0cc10
Add accidentally omited tests for LRUCache
shoyer Jul 11, 2018
8784e6b
Merge branch 'master' into file-manager
shoyer Jul 28, 2018
83d9b10
Adapt scipy backend to use FileManager
shoyer Jul 28, 2018
a0074ff
Stickler fix
shoyer Jul 28, 2018
062ba96
Fix failure on Python 2.7
shoyer Jul 29, 2018
2d41b29
Finish adjusting backends to use FileManager
shoyer Jul 29, 2018
2adf486
Fix bad import
shoyer Jul 30, 2018
76f151c
WIP on distributed
shoyer Aug 1, 2018
769f079
More WIP
shoyer Aug 6, 2018
3e97264
Merge branch 'master' into file-manager
shoyer Aug 17, 2018
5e67efe
Fix distributed write tests
shoyer Aug 19, 2018
8dc77c4
Merge branch 'master' into file-manager
shoyer Aug 19, 2018
1d38335
Fixes
shoyer Aug 19, 2018
6350ca6
Minor fixup
shoyer Aug 20, 2018
4aa0df7
whats new
shoyer Aug 30, 2018
67377c7
More refactoring: remove state from backends entirely
shoyer Aug 31, 2018
8c00f44
Merge branch 'master' into file-manager
shoyer Sep 6, 2018
2a5d1f0
Cleanup
shoyer Sep 6, 2018
a6c170b
Fix failing in-memory datastore tests
shoyer Sep 6, 2018
009e30d
Fix inaccessible datastore
shoyer Sep 6, 2018
14118ea
fix autoclose warnings
shoyer Sep 6, 2018
c778488
Fix PyNIO failures
shoyer Sep 6, 2018
fe14ebf
No longer disable HDF5 file locking
shoyer Sep 7, 2018
f1026ce
whats new and default file cache size
shoyer Sep 7, 2018
e13406b
Whats new tweak
shoyer Sep 7, 2018
465dfae
Refactor default lock logic to backend classes
shoyer Sep 10, 2018
55d35c8
Rename get_resource_lock -> get_write_lock
shoyer Sep 10, 2018
c8fbadc
Don't acquire unnecessary locks in __getitem__
shoyer Sep 10, 2018
ede8ef0
Merge branch 'master' into file-manager
shoyer Sep 26, 2018
220c302
Merge branch 'master' into file-manager
shoyer Oct 8, 2018
36f1156
Fix bad merge
shoyer Oct 9, 2018
c6f43dd
Fix import
shoyer Oct 9, 2018
8916bc7
Remove unreachable code
shoyer Oct 9, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 0 additions & 27 deletions xarray/backends/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -508,30 +508,3 @@ def assert_open(self):
if not self._isopen:
raise AssertionError('internal failure: file must be open '
'if `autoclose=True` is used.')


class PickleByReconstructionWrapper(object):

def __init__(self, opener, file, mode='r', **kwargs):
self.opener = partial(opener, file, mode=mode, **kwargs)
self.mode = mode
self._ds = None

@property
def value(self):
self._ds = self.opener()
return self._ds

def __getstate__(self):
state = self.__dict__.copy()
del state['_ds']
if self.mode == 'w':
# file has already been created, don't override when restoring
state['mode'] = 'a'
return state

def __setstate__(self, state):
self.__dict__.update(state)

def close(self):
self._ds.close()
155 changes: 155 additions & 0 deletions xarray/backends/file_manager.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
import contextlib
import threading


class FileManager(object):
"""Base class for context managers for managing file objects.

Unlike files, FileManager objects should be safely. They must be explicitly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this description is missing a few words.

closed.

Example usage:

import functools

manager = FileManager(functools.partial(open, filename), mode='w')
with manager.acquire() as f:
f.write(...)
manager.close()
"""

def __init__(self, opener, mode=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also support **kwargs here. Or maybe that's all we should support here. Or, perhaps you are thinking opener would be a partial function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of opener as a partial, but I agree that it would probably be easier to understand if args and kwargs are passed directly.

"""Initialize a FileManager.

Parameters
----------
opener : callable
Callable that opens a given file when called, returning a file
object.
mode : str, optional
If provided, passed to opener as a keyword argument.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W291 trailing whitespace

"""
raise NotImplementedError

@contextlib.contextmanager
def acquire(self):
"""Context manager for acquiring a file object.

This method must be thread-safe: it should be safe to simultaneously
acquire a file in multiple threads at the same time (assuming that
the underlying file object is thread-safe).

Yields
------
Open file object, as returned by opener().
"""
raise NotImplementedError

def close(self):
"""Explicitly close any associated file object (if necessary)."""
raise NotImplementedError


_DEFAULT_MODE = object()


def _open(opener, mode):
return opener() if mode is _DEFAULT_MODE else opener(mode=mode)


class ExplicitFileManager(FileManager):
"""A file manager that holds a file open until explicitly closed.

This is mostly a reference implementation: must real use cases should use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must->most

ExplicitLazyFileContext for better performance.
"""

def __init__(self, opener, mode=_DEFAULT_MODE):
self._opener = opener
# file has already been created, don't override when restoring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand on this a bit? How do we KNOW that the file has already been created? I'm wondering if the mode switch should go after the file open line.

self._mode = 'a' if mode == 'w' else mode
self._file = _open(opener, mode)

@contextlib.contextmanager
def acquire(self):
yield self._file

def close(self):
self._file.close()

def __getstate__(self):
return {'opener': self._opener, 'mode': self._mode}

def __setstate__(self, state):
self.__init__(**state)


class LazyFileManager(FileManager):
"""An explicit file manager that lazily opens files."""

def __init__(self, opener, mode=_DEFAULT_MODE):
self._opener = opener
self._mode = mode
self._lock = threading.Lock()
self._file = None

@contextlib.contextmanager
def acquire(self):
with self._lock:
if self._file is None:
self._file = _open(self._opener, self._mode)
# file has already been created, don't override when restoring
if self._mode == 'w':
self._mode = 'a'
yield self._file

def close(self):
if self._file is not None:
self._file.close()

def __getstate__(self):
return {'opener': self._opener, 'mode': self._mode}

def __setstate__(self, state):
self.__init__(**state)


class AutoclosingFileManager(FileManager):
"""A FileManager that automatically opens/closes files when used."""

def __init__(self, opener, mode=_DEFAULT_MODE):
self._opener = opener
self._mode = mode
self._lock = threading.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on allowing other locks to be passed in here? Do we need to support the CombinedLock concept as well?

self._file = None
self._references = 0

@contextlib.contextmanager
def acquire(self):
with self._lock:
if self._file is None:
self._file = _open(self._opener, self._mode)
# file has already been created, don't override when restoring
if self._mode == 'w':
self._mode = 'a'
self._references += 1

yield self._file

with self._lock:
self._references -= 1
if not self._references:
self._file.close()
self._file = None

def close(self):
pass

def __getstate__(self):
return {'opener': self._opener, 'mode': self._mode}

def __setstate__(self, state):
self.__init__(**state)


# TODO: write a FileManager that makes use of an LRU cache.
Loading