Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept int value in head, thin and tail #3298

Merged
merged 5 commits into from
Sep 14, 2019

Conversation

griverat
Copy link
Contributor

@griverat griverat commented Sep 9, 2019

Related #3278
This PR makes the methods head, thin and tail for both DataArray and Dataset accept a single integer value as a parameter. If no parameter is given, then it defaults to 5.

  • Tests added
  • Passes black . && mypy . && flake8

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks @DangoMelon

I'll wait for a day in case of other comments

xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
) -> "Dataset":
"""Returns a new dataset with each array indexed along every `n`th
value for the specified dimension(s)

Parameters
----------
indexers : dict, optional
A dict with keys matching dimensions and integer values `n`.
indexers : dict or int, default: 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to thin by a factor of five by default? Or should we not have a default value? The use case for a default thinning value are less clear to me than defaults for head/tail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think picking a default makes it very convenient to use. And this is a convenience method...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set a default value to thin to make it even with the other two methods (really didn't think that much of it usage) but I agree with @dcherian that it might be ok to have it for convenience.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong enough view. I agree it's a convenience method, but a convenient value significantly depends on the size of the array, unlike with head & tail

So whatever you think. I'm probably a -0.2

with raises_regex(TypeError, "must be an int"):
self.dv.tail(x=3.1)
with raises_regex(ValueError, "must be positive"):
self.dv.tail(-3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very thorough tests! Thank you!

@max-sixty
Copy link
Collaborator

Any other feedback before we merge? (Errors are unrelated)

if not isinstance(v, int):
raise TypeError("indexer value must be an integer")
elif v < 0:
raise ValueError("indexer value must be positive")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is help to include a little more context in error messages if possible. In this case, you could include offending the name and value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, Something along these lines maybe?

"expected integer as indexer value, found type %r for dim %r" % (type(v), k)

and

"expected positive integer as indexer value for dim %r" % k

The k and v come from iterating over indexers.items()

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, though it would be nice to add more context to the error messages.

@griverat
Copy link
Contributor Author

Any other feedback before we merge? (Errors are unrelated)

@max-sixty I think the typing errors are due to changing the Any to int and actually finding slice.

@shoyer
Copy link
Member

shoyer commented Sep 14, 2019 via email

@max-sixty
Copy link
Collaborator

Here's a solution to the mypy issue:

diff --git a/xarray/core/dataset.py b/xarray/core/dataset.py
index 1d762114..ee4ebf4d 100644
--- a/xarray/core/dataset.py
+++ b/xarray/core/dataset.py
@@ -2046,8 +2046,8 @@ class Dataset(Mapping, ImplementsDatasetReduce, DataWithCoords):
                 raise TypeError("indexer value must be an integer")
             elif v < 0:
                 raise ValueError("indexer value must be positive")
-        indexers = {k: slice(val) for k, val in indexers.items()}
-        return self.isel(indexers)
+        indexers_slices = {k: slice(val) for k, val in indexers.items()}
+        return self.isel(indexers_slices)
 
     def tail(
         self,
@@ -2087,11 +2087,11 @@ class Dataset(Mapping, ImplementsDatasetReduce, DataWithCoords):
                 raise TypeError("indexer value must be an integer")
             elif v < 0:
                 raise ValueError("indexer value must be positive")
-        indexers = {
+        indexers_slices = {
             k: slice(-val, None) if val != 0 else slice(val)
             for k, val in indexers.items()
         }
-        return self.isel(indexers)
+        return self.isel(indexers_slices)
 
     def thin(
         self,
@@ -2134,8 +2134,8 @@ class Dataset(Mapping, ImplementsDatasetReduce, DataWithCoords):
                 raise ValueError("indexer value must be positive")
             elif v == 0:
                 raise ValueError("step cannot be zero")
-        indexers = {k: slice(None, None, val) for k, val in indexers.items()}
-        return self.isel(indexers)
+        indexers_slices = {k: slice(None, None, val) for k, val in indexers.items()}
+        return self.isel(indexers_slices)
 
     def broadcast_like(
         self, other: Union["Dataset", "DataArray"], exclude: Iterable[Hashable] = None

@griverat
Copy link
Contributor Author

@shoyer I hope it's much clear now, I tried to phrase what you suggested.
@max-sixty That did the trick!
Thank you both for your suggestions.

@shoyer shoyer merged commit 7fb3b19 into pydata:master Sep 14, 2019
@shoyer
Copy link
Member

shoyer commented Sep 14, 2019

thanks @DangoMelon !

@griverat griverat deleted the defvals-head-thin-tail branch September 15, 2019 04:52
@jhamman
Copy link
Member

jhamman commented Sep 15, 2019

@DangoMelon - thanks for your contribution. This will get used a lot!

dcherian added a commit to dcherian/xarray that referenced this pull request Sep 19, 2019
* master:
  Fix whats-new date :/
  Revert to dev version
  Release v0.13.0
  auto_combine deprecation to 0.14 (pydata#3314)
  Deprecation: groupby, resample default dim. (pydata#3313)
  Raise error if cmap is list of colors (pydata#3310)
  Refactor concat to use merge for non-concatenated variables (pydata#3239)
  Honor `keep_attrs` in DataArray.quantile (pydata#3305)
  Fix DataArray api doc (pydata#3309)
  Accept int value in head, thin and tail (pydata#3298)
  ignore h5py 2.10.0 warnings and fix invalid_netcdf warning test. (pydata#3301)
  Update why-xarray.rst with clearer expression (pydata#3307)
  Compat and encoding deprecation to 0.14 (pydata#3294)
  Remove deprecated concat kwargs. (pydata#3288)
  allow np-array levels and colors in 2D plots (pydata#3295)
  Remove some deprecations (pydata#3292)
  Make argmin/max work lazy with dask (pydata#3244)
  Add head, tail and thin methods (pydata#3278)
  Updater to testing environment name (pydata#3253)
dcherian added a commit that referenced this pull request Sep 24, 2019
* upstream/master: (43 commits)
  Add hypothesis support to related projects (#3335)
  More doc fixes (#3333)
  Improve the documentation of swap_dims (#3331)
  fix the doc names of the return value of swap_dims (#3329)
  Fix isel performance regression (#3319)
  Allow weakref (#3318)
  Clarify that "scatter" is a plotting method in what's new. (#3316)
  Fix whats-new date :/
  Revert to dev version
  Release v0.13.0
  auto_combine deprecation to 0.14 (#3314)
  Deprecation: groupby, resample default dim. (#3313)
  Raise error if cmap is list of colors (#3310)
  Refactor concat to use merge for non-concatenated variables (#3239)
  Honor `keep_attrs` in DataArray.quantile (#3305)
  Fix DataArray api doc (#3309)
  Accept int value in head, thin and tail (#3298)
  ignore h5py 2.10.0 warnings and fix invalid_netcdf warning test. (#3301)
  Update why-xarray.rst with clearer expression (#3307)
  Compat and encoding deprecation to 0.14 (#3294)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants