-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a GRIB backend via ECMWF cfgrib / ecCodes #2476
Add a GRIB backend via ECMWF cfgrib / ecCodes #2476
Conversation
Hello @alexamici! Thanks for updating the PR.
Comment last updated on October 15, 2018 at 13:59 Hours UTC |
@alexamici great to see this! I'm about to merge a refactor of xarray's backends for v0.11 in #2261. You'll need to refactor a little bit to accommodate this, but hopefully that should be straightforward. The new interface should be a little easier to use, especially for handling many files or dask-distributed support. |
xarray/backends/cfgrib_.py
Outdated
return array | ||
|
||
|
||
FLAVOURS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this configuration dict something that could live in cfgrid rather than xarray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will do.
xarray/backends/cfgrib_.py
Outdated
""" | ||
def __init__(self, ds, variable_map={}, autoclose=False): | ||
self.ds = ds | ||
self.variable_map = variable_map.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for including variable_map
in the interface on the DataStore class, rather than letting users change variable names later with Dataset.rename
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same issue as above. The whole coordinate rename doesn't belong here and will be moved in cfgrib.
c69b1e8
to
72606f7
Compare
@shoyer at long last! :) I quickly sync'ed with the new backend API. I did some light testing. Note that I didn't test with dask at all and I'm not using the passed |
The appropriate lock to use depends on pynio is probably the simplest example of how to write a new backend: The main difference is that you should make use of |
The Furthermore I expect dask performance to be abysmal until I implement ecmwf/cfgrib#20. |
@shoyer BTW what timeframe do you expect for the v0.11 release? And would you consider merging this Pull Request before the release, assuming that we do a cfgrib release with read support declared beta? |
xarray/backends/cfgrib_.py
Outdated
if lock is None: | ||
lock = ECCODES_LOCK | ||
self.lock = ensure_lock(lock) | ||
backend_kwargs['filter_by_keys'] = tuple(backend_kwargs.get('filter_by_keys', {}).items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a little surprising to me -- can we simply pass on backend_kwargs
directly to cfgrib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a ugly hack around the fact that filter_by_keys
is a dict but CachingFileManager
accepts only hashable backend_kwargs
because they are passed to _HashedSequence
.
filter_by_keys
has a very nice dict semantics, so I'd prefer not to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that makes please. I think the need to hash arguments used to open files with CachingFileManager is unavoidable, so this is a reasonable workaround. But please add a comment explaining this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I added a comment and made the code more explicit.
xarray/backends/cfgrib_.py
Outdated
self.backend_array = backend_array | ||
|
||
def __getattr__(self, item): | ||
return getattr(self.backend_array, item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of forwarding is really error prone. Let's avoid if it in favor of a explicit solution if at all possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The v0.11 release still need more work (to complete deprecations), so it's
at least a week or two out (probably more). I suspect we could get this PR
ready together in time.
…On Tue, Oct 9, 2018 at 3:59 PM Alessandro Amici ***@***.***> wrote:
@shoyer <https://github.com/shoyer> BTW what timeframe do you expect for
the v0.11 release? And would you consider merging this Pull Request before
the release, assuming that we do a *cfgrib* release with read support
declared *beta*?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2476 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1nCJOGlFSIEtL28sQ80GZqhzJ_Q2ks5ujSpVgaJpZM4XOXA8>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty reasonable but it needs tests, ideally including an integration test that verifies we can actually read data from a grib file. I will see if I can dig up a good example from xarray/tests/test_backends.py, but the pynio or pseudonetcdf tests are likely a good start.
xarray/backends/cfgrib_.py
Outdated
if lock is None: | ||
lock = ECCODES_LOCK | ||
self.lock = ensure_lock(lock) | ||
backend_kwargs['filter_by_keys'] = tuple(backend_kwargs.get('filter_by_keys', {}).items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that makes please. I think the need to hash arguments used to open files with CachingFileManager is unavoidable, so this is a reasonable workaround. But please add a comment explaining this.
xarray/backends/cfgrib_.py
Outdated
|
||
def __getitem__(self, key): | ||
return indexing.explicit_indexing_adapter( | ||
key, self.shape, indexing.IndexingSupport.BASIC, self._getitem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify which forms of indexing cfgrib actually supports? In your previous commit this was OUTER_1VECTOR. See the docstring on IndexingSupport for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I checked, I can say that cfgrib supports indexing.IndexingSupport.OUTER
. I don't remember how I ended up declaring OUTER_1VECTOR, and since it sounded wrong, in the last commit intended to play it as safe as possible.
Code updated.
@shoyer thank you very much for the guidance, it is very appreciated! I stared working on the tests, but I've been blocked immediately by something that looks trivial. I cannot get the test class
How do you make pytest run test classes that do not start with |
@alexamici oops, we accidentally disabled most of our backend unit tests -- see #2479 for the fix. |
Tests should be fixed if you merge in master now. |
@shoyer I added the Questions:
It looks like we are very close, what do you think? Shall I move to the documentation? |
Tests added and passing with 100% coverage. BTW, I did a 0.9.0 beta release of cfgrib and I plan to give the public announcement tomorrow. :) |
@shoyer I'm ready to integrate more feedback (especially on the documentation), but I removed the Do you usually keep the history of PRs as it is or do you prefer me to rebase? |
We squash commits upon merging, so please leave things as is :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the way this looks! If you want to verify that things work with dask-distributed (which is probably a good idea!), I would suggest adding an integration test in test_distributed.py
. The rasterio test in there is probably a good example.
xarray/backends/cfgrib_.py
Outdated
@@ -0,0 +1,97 @@ | |||
# | |||
# Copyright 2017-2018 European Centre for Medium-Range Weather Forecasts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really object to this notice, but we don't include it in other source code files for xarray so it looks a little out of place (perhaps we should?). Everything is Apache 2 licensed already, and owned by contributors (or whoever they assign it to, such as an employer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to handle this one. ECMWF is quite sensitive to license and IPR matters so I added the licence boilerplate to all cfgrib files and later I simply copied the existing backend code as part of the PR.
I'm the material author of the code and my name will appear in the contributors, but I've been working fully funded by ECMWF as an external contractor, so it looks like proper attribution would be lost in this case.
I need to ask @StephanSiemen if they object to removing the copyright notice or what else they propose. To my knowledge this is the first contribution to an external Open Source project funded by ECMWF this way and we are learning how to handle these kind of details as we go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It says "Copyright 2014-2018, xarray Developers" in our README.rst file, which was modeled off projects like NumPy: https://github.com/numpy/numpy/blob/master/LICENSE.txt
I see now that our LICENSE file just has the original Apache license text. Perhaps we should add in the more specific "Copyright xarray developers" line, like a project like TensorFlow: https://github.com/tensorflow/tensorflow/blob/master/LICENSE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexamici (and ECMWF by proxy I suppose) - first, I want to make sure you understand that I'm very encouraged by your recent developments of cfgrib as an open source package and your contributions to xarray. They are much appreciated and they stand to benefit a wide swath of the geoscience community.
That said, at this point, I'm somewhat against adding this copyright/license header for these reasons:
- Xarray already has a clear copyright statement that is permissive enough to allow you (and ECMWF) to maintain copyrights over your contributions.
- We don't do this anywhere else in the xarray code base. Though many of the contributors that have developed xarray have done so as employees/contractors of various organizations, we've not adopted this level of documentation with regard to the original author or subsequent editors.
Now, I understand that it can be important for organizations to make visible their open source contributions so we want to make sure this sort of engagement can continue to happen. We've recently joined NUMFOCUS, in part to give us access to some proper legal advice when necessary. We can certainly solicit their advice here if we have technical questions.
It's probably worth pinging the rest of @pydata/xarray to get their thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexamici, @shoyer, @jhamman,
We at ECMWF are very happy for the copyright to be adjusted according to other contributions in xarray. Any acknowledgement of the contribution is much appreciated. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I was being overzealous :)
I removed the copyright notice and licence boilerplate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alexamici and @StephanSiemen!
@shoyer I added a test for dask.distributed and it passes but please check that it is meaningful, as I'm not completely sure what to test. |
doc/io.rst
Outdated
to :py:func:`~xarray.open_dataset`: | ||
|
||
.. ipython:: python | ||
|
||
ds_grib = xr.open_dataset('example.grib', engine='cfgrib') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If example.grib
doesn't actually exist when we build the docs, this will give a nasty error message. It would be better to use a :verbatim:
directive here -- scroll open to the opendap examples to see what that looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch! I noticed that there was no docs testing in Travis-CI and wondered if you where testing the docs at all. I should have built the docs myself. What is the intended way do you build the docs? python sertup.py build_sphinx
fails on my setup due to missing numpydoc
.
I'm fixing it as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the :verbatim:
directive, but I was not able to test the build of the documentation due to several errors when building the gallery (a number of core dump in GEOS, maybe due to the setup on my MacOS).
The fix looks trivial enough that it may work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would work:
.. ipython::
:verbatim:
In [3]: ds_grib = xr.open_dataset('example.grib', engine='cfgrib')
We actually do test the doc build on Travis-CI, but unfortunately there's no easy way to see generated docs and also errors in ipython
directive blocks don't stop the build (there's a bug we filed about this somewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I mis-read the example.
Thanks @alexamici and ECMWF! |
Congratulations! |
This is currently a WIP PR for review.
whats-new.rst
for all changes andapi.rst
for new API.The implementation depends on the python module cfgrib and the C-library ecCodes to be installed.
Work in progress items:
CachingFileManager
interfacecc @StephanSiemen @iainrussell