Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quarter offset implemented (base is now latest pydata-master). #2721

Merged
merged 14 commits into from
Mar 2, 2019

Conversation

jwenfai
Copy link
Contributor

@jwenfai jwenfai commented Jan 27, 2019

Waiting for reviews before editing whats-new.rst.

@spencerkclark
Copy link
Member

Perfect, @jwenfai, thanks -- I have a busy start to this week, but hopefully I'll get some time to review this by next weekend.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good to me @jwenfai. I only have a few mostly minor comments/suggestions. Please go ahead and add a what's new entry.

Would you mind also adding a single test case to the test for cftime_range with a quarterly offset? I'm pretty sure things should work fine based on your thorough testing of the offset operations, but it would be good to have at least one test verifying that they work in cftime_range, since that is essentially the only place they are used in practice in xarray.

+----------+--------------------------------------------------------------------+
| Q(S)-FEB | Quarter frequency, anchored at the end (or beginning) of February |
+----------+--------------------------------------------------------------------+
| Q(S)-MAR | Quarter frequency, anchored at the end (or beginning) of January |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, fix the full month names at the end of each entry (e.g. this should be March instead of January).



def _get_day_of_month(other, day_option):
"""Find the day in `other`'s month that satisfies a DateOffset's onOffset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update this docstring to use the appropriate names for cftime offsets (e.g. DateOffset->BaseCFTimeOffset, day_opt->day_option).

self.month = self._default_month
else:
self.month = month
if not isinstance(self.month, int):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create a helper function for this checking logic (e.g. def _validate_month(month)), since the same is used in YearOffset below?

elif day_option == 'end':
days_in_month = _days_in_month(other)
return days_in_month
elif isinstance(day_option, np.integer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is not currently used, so let's refrain from adding it for now.

other : cftime.datetime
n : number of periods to increment, before adjusting for rolling
month : int reference month giving the first month of the year
day_option : 'start', 'end', 'business_start', 'business_end', or int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the options that are currently not supported (e.g .'business_start' etc.).


See Also
--------
get_day_of_month : Find the day in a month provided an offset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in your definition this function has a leading underscore, i.e. _get_day_of_month.

@@ -562,19 +707,22 @@ def test_onOffset(calendar, date_args, offset, expected):


@pytest.mark.parametrize(
('year_month_args', 'sub_day_args', 'offset'),
('year_quarter_month_args', 'sub_day_args', 'offset'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this argument name was meant to denote that it is a length-two tuple with the "year and month" of the date tested, so there is no need to change the name.

month = 2 corresponds to dates like 2/28/2007, 5/31/2007, ...
month = 3 corresponds to dates like 3/31/2007, 6/30/2007, ...
"""
# In pandas, _from_name_startingMonth = 1 used when freq='QS'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...this is indeed confusing behavior in pandas. Reluctantly I'm inclined to match it for consistency, so maybe let's go with _default_month = 3 here? In the _FREQUENCIES dictionary you can then just use partial(QuarterBegin, month=1) for the 'QS' entry, so that it behaves properly in to_offset.

"""
# In pandas, QuarterOffset._from_name suffix == 'DEC'
# See _lite_rule_alias in pandas._libs.tslibs.frequencies
_default_month = 12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment in QuarterBegin above, I'm inclined to match pandas' behavior here for consistency, so maybe let's go with _default_month = 3 here. In the _FREQUENCIES dictionary you can then just use partial(QuarterEnd, month=12) for the 'Q' entry, so that it behaves properly in to_offset.

@jwenfai
Copy link
Contributor Author

jwenfai commented Feb 2, 2019

Fixes implemented and tests are all passing. I had to modify the tests in test_cftime_offsets.py now that _default_month = 3 for QuarterBegin and QuarterEnd.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jwenfai. Once #2593 is merged, would you mind adding back in your changes to fully support resample for these offsets?

doc/whats-new.rst Outdated Show resolved Hide resolved
Co-Authored-By: jwenfai <jwenfai@gmail.com>
@jwenfai
Copy link
Contributor Author

jwenfai commented Feb 2, 2019

Sure, I'll add resample support back in once #2593 is merged.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding back in the resample support -- just a few more minor comments.

@@ -47,6 +47,8 @@ Enhancements
report showing what exactly differs between the two objects (dimensions /
coordinates / variables / attributes) (:issue:`1507`).
By `Benoit Bovy <https://github.com/benbovy>`_.
- CFTimeIndex now supports QuarterBegin and QuarterEnd offsets. (:issue:`2663`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this duplicate entry.


def __init__(self, n=1, normalize=False, month=None):
BaseCFTimeOffset.__init__(self, n)
self.normalize = normalize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally put off adding support for the normalize option in the other offsets, because it was not necessary for cftime_range (and by extension resample). Even for these quarterly offsets, you would need to do some more work to fully support/test it (e.g. you would need some modifications to __apply__ as well -- note the use of the @apply_wraps decorator in pandas). Let's maybe hold off on adding it until there is a need.

return _get_day_of_month(other, self._day_option)


def _is_normalized(datetime):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment related to the normalize option; as a result I think we can hold off on adding this function for now.

…ffsets.py. Removed redundant lines in whats-new.rst.
@jwenfai
Copy link
Contributor Author

jwenfai commented Feb 3, 2019

I didn't realize support for normalization was more involved; normalization-related code should all be removed now. Duplicate entry on whats-new.rst is gone as well.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last couple small things; otherwise this looks good to me.

day_option : 'start', 'end'
'start': returns 1
'end': returns last day of the month
int: returns the day in the month indicated by `other`, or the last of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've since removed this option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

'8D', '8001D',
'2MS', '3MS',
'2QS-AUG', '3QS-SEP',
'3AS-MAR', '4A-MAY'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems in refactoring these tests (which in general I'm fine with), we lost test cases where pandas resampling would fail. Do you know of one you could add back in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added them back in. For reference, pandas resampling is failing when resampling to small daily frequencies (probably sub-monthly) such as 3D or 8D and the value for base is 24. The frequency of the original time range does not matter.

…back in that raises ValueError when resampling (base=24 when resampling to daily freq, e.g., '8D').
@spencerkclark spencerkclark mentioned this pull request Feb 19, 2019
@pep8speaks
Copy link

pep8speaks commented Feb 26, 2019

Hello @jwenfai! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 26, 2019 at 00:48 Hours UTC

@spencerkclark
Copy link
Member

@jwenfai I made a couple minor edits to the docstrings/comments to bring them up to date (let me know if those look ok). If there are no more comments from others in the next few days, I'll go ahead and merge this.

@jwenfai
Copy link
Contributor Author

jwenfai commented Feb 26, 2019

@spencerkclark Thanks for improving the docstring and cleaning up the extra whitespaces. They look ok to me.

@spencerkclark
Copy link
Member

Going to go ahead and merge this now; thanks @jwenfai!

@spencerkclark spencerkclark merged commit 627a881 into pydata:master Mar 2, 2019
dcherian added a commit to yohai/xarray that referenced this pull request Mar 18, 2019
* upstream/master:
  Rework whats-new for 0.12
  Add whats-new for 0.12.1
  Release 0.12.0
  enable loading remote hdf5 files (pydata#2782)
  Push back finalizing deprecations for 0.12 (pydata#2809)
  Drop failing tests writing multi-dimensional arrays as attributes (pydata#2810)
  some docs updates (pydata#2746)
  Add support for cftime.datetime coordinates with coarsen (pydata#2778)
  Don't use deprecated np.asscalar() (pydata#2800)
  Improve name concat (pydata#2792)
  Add `Dataset.drop_dims` (pydata#2767)
  Quarter offset implemented (base is now latest pydata-master). (pydata#2721)
  Add use_cftime option to open_dataset (pydata#2759)
  Bugfix/reduce no axis (pydata#2769)
  'standard' now refers to 'gregorian' in cftime_range (pydata#2771)
pletchm pushed a commit to pletchm/xarray that referenced this pull request Mar 21, 2019
…a#2721)

* Quarter offset implemented (base is now latest pydata-master).

* Fixed issues raised in review (pydata#2721 (review))

* Updated whats-new.rst with info on quarter offset support.

* Updated whats-new.rst with info on quarter offset support.

* Update doc/whats-new.rst

Co-Authored-By: jwenfai <jwenfai@gmail.com>

* Added support for quarter frequencies when resampling CFTimeIndex. Less redundancy in CFTimeIndex resampling tests.

* Removed normalization code (unnecessary for cftime_range) in cftime_offsets.py. Removed redundant lines in whats-new.rst.

* Removed invalid option from _get_day_of_month docstring. Added tests back in that raises ValueError when resampling (base=24 when resampling to daily freq, e.g., '8D').

* Minor edits to docstrings/comments

* lint
pletchm pushed a commit to pletchm/xarray that referenced this pull request Mar 21, 2019
…a#2721)

* Quarter offset implemented (base is now latest pydata-master).

* Fixed issues raised in review (pydata#2721 (review))

* Updated whats-new.rst with info on quarter offset support.

* Updated whats-new.rst with info on quarter offset support.

* Update doc/whats-new.rst

Co-Authored-By: jwenfai <jwenfai@gmail.com>

* Added support for quarter frequencies when resampling CFTimeIndex. Less redundancy in CFTimeIndex resampling tests.

* Removed normalization code (unnecessary for cftime_range) in cftime_offsets.py. Removed redundant lines in whats-new.rst.

* Removed invalid option from _get_day_of_month docstring. Added tests back in that raises ValueError when resampling (base=24 when resampling to daily freq, e.g., '8D').

* Minor edits to docstrings/comments

* lint
shoyer pushed a commit that referenced this pull request Mar 26, 2019
…ns with size>1 (#2757)

* Quarter offset implemented (base is now latest pydata-master). (#2721)

* Quarter offset implemented (base is now latest pydata-master).

* Fixed issues raised in review (#2721 (review))

* Updated whats-new.rst with info on quarter offset support.

* Updated whats-new.rst with info on quarter offset support.

* Update doc/whats-new.rst

Co-Authored-By: jwenfai <jwenfai@gmail.com>

* Added support for quarter frequencies when resampling CFTimeIndex. Less redundancy in CFTimeIndex resampling tests.

* Removed normalization code (unnecessary for cftime_range) in cftime_offsets.py. Removed redundant lines in whats-new.rst.

* Removed invalid option from _get_day_of_month docstring. Added tests back in that raises ValueError when resampling (base=24 when resampling to daily freq, e.g., '8D').

* Minor edits to docstrings/comments

* lint

* Add `Dataset.drop_dims` (#2767)

* ENH: Add Dataset.drop_dims()

* Drops full dimensions and any corresponding variables in a
  Dataset
* Fixes GH1949

* DOC: Add Dataset.drop_dims() documentation

* Improve name concat (#2792)

* Added tests of desired name inferring behaviour

* Infers names

* updated what's new

* Don't use deprecated np.asscalar() (#2800)

It got deprecated in numpy 1.16 and throws a ton of warnings due to
that.
All the function does is returning .item() anyway, which is why it got
deprecated.

* Add support for cftime.datetime coordinates with coarsen (#2778)

* some docs updates (#2746)

* Friendlier io title.

* Fix lists.

* Fix *args, **kwargs

"inline emphasis..."

* misc

* Reference xarray_extras for csv writing. Closes #2289

* Add metpy accessor. Closes #461

* fix transpose docstring. Closes #2576

* Revert "Fix lists."

This reverts commit 39983a5.

* Revert "Fix *args, **kwargs"

This reverts commit 1b9da35.

* Add MetPy to related projects.

* Add Weather and Climate specific page.

* Add hvplot.

* Note open_dataset, mfdataset open files as read-only (closes #2345).

* Update metpy 1

Co-Authored-By: dcherian <dcherian@users.noreply.github.com>

* Update doc/weather-climate.rst

Co-Authored-By: dcherian <dcherian@users.noreply.github.com>

* Drop failing tests writing multi-dimensional arrays as attributes (#2810)

These aren't valid for netCDF files.

Fixes GH2803

* Push back finalizing deprecations for 0.12 (#2809)

0.12 will already have a big change in dropping Python 2.7 support. I'd rather
wait a bit longer to finalize these deprecations to minimize the impact on
users.

* enable loading remote hdf5 files (#2782)

* attempt at loading remote hdf5

* added a couple tests

* rewind bytes after reading header

* addressed comments for tests and error message

* fixed pep8 formatting

* created _get_engine_from_magic_number function, new tests

* added description in whats-new

* fixed test failure on windows

* same error on windows and nix

* Release 0.12.0

* Add whats-new for 0.12.1

* Rework whats-new for 0.12

* DOC: Update donation links

* DOC: remove outdated warning (#2818)

* Allow expand_dims() method to support inserting/broadcasting dimensions with size>1 (#2757)

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5

* Allow expand_dims() method to support inserting/broadcasting dimensions with size>1 (#2757)

 * use .size attribute to determine the size of a dimension, rather than converting to a list, which can be slow for large iterables

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5

* Allow expand_dims() method to support inserting/broadcasting dimensions with size>1 (#2757)

 * Move enhancement description up to 0.12.1

 * use .size attribute to determine the size of a dimension, rather than converting to a list, which can be slow for large iterables

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants