Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator tick_calendar #255

Merged
merged 31 commits into from
Oct 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
15b3399
New op: tick_calendar structure
DonBraulio Aug 29, 2023
59f171b
tick_calendar arguments, initial doc & examples
DonBraulio Sep 5, 2023
8fa5bc9
Changed args and attributes in tick_calendar
DonBraulio Sep 6, 2023
68a9492
Initial C++ code for tick_calendar
DonBraulio Sep 6, 2023
2ce5d0e
tick_calendar C++ implementation
DonBraulio Sep 7, 2023
5a004c1
Test & bugfix for tick_calendar (end of month)
DonBraulio Sep 7, 2023
5f88dfe
Added tests for weekdays and end of year (tick calendar)
DonBraulio Sep 8, 2023
6eb1ea3
Auto setup None args in tick_calendar
DonBraulio Sep 8, 2023
83c33cc
Many docstring examples for tick_calendar
DonBraulio Sep 8, 2023
24855ee
Fixed tests, added .md files
DonBraulio Sep 8, 2023
b390263
Addressed some comments after PR
DonBraulio Sep 11, 2023
8b9ba7c
Handle Literal types in typecheck
DonBraulio Sep 12, 2023
f418eae
tick_calendar args serialized (type ANY), get ranges in implementation
DonBraulio Sep 12, 2023
da7a61f
Updated tests for new tick_calendar args
DonBraulio Sep 12, 2023
ec4ab1d
Fixes and test for tick_calendar core function
DonBraulio Sep 12, 2023
1cb7078
Small bugfix in docstring example
DonBraulio Sep 12, 2023
9c9c4d6
Merge branch 'main' into calendar-ticks
DonBraulio Sep 12, 2023
c8690d7
Added invalid args tests
DonBraulio Sep 12, 2023
25f476f
Fix in docstring
DonBraulio Sep 12, 2023
c9693ec
Changes after PR comments
DonBraulio Sep 13, 2023
0742987
Update CHANGELOG
DonBraulio Sep 21, 2023
13952dc
Fix bazel deps
DonBraulio Sep 21, 2023
3f9535a
Minor update
DonBraulio Sep 25, 2023
fa2fe79
Merge main into calendar-ticks
DonBraulio Sep 26, 2023
75b795e
Bugfix & more tests in tick_calendar
DonBraulio Sep 27, 2023
06f4ce6
Bugfix in tick_calendar cpp: set UTC explicitly
DonBraulio Sep 27, 2023
5674141
Add SetTimezone context and test TZ!=UTC
DonBraulio Sep 28, 2023
3885bdd
Bugfix in tick_calendar.cc for timezone support
DonBraulio Sep 28, 2023
bab9b6a
Merge branch 'main' into calendar-ticks
DonBraulio Sep 28, 2023
2bf52ec
Merge branch 'main' into calendar-ticks
DonBraulio Oct 4, 2023
662380c
Fixes after merge
DonBraulio Oct 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

### Features

- Added `EventSet.tick_calendar()` operator.
- Added `EventSet.where()` operator.
- Add `filter_moving_count` operator.

Expand Down
1 change: 1 addition & 0 deletions docs/src/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ Check the index on the left for a more detailed description of any symbol.
| [`EventSet.set_index()`][temporian.EventSet.set_index] | Replaces the indexes in an [`EventSet`][temporian.EventSet]. |
| [`EventSet.since_last()`][temporian.EventSet.since_last] | Computes the amount of time since the last distinct timestamp. |
| [`EventSet.tick()`][temporian.EventSet.tick] | Generates timestamps at regular intervals in the range of a guide. |
| [`EventSet.tick_calendar()`][temporian.EventSet.tick] | Generates timestamps at the specified calendar date-time events. |
| [`EventSet.timestamps()`][temporian.EventSet.timestamps] | Creates a feature from the events timestamps (`float64`). |
| [`EventSet.unique_timestamps()`][temporian.EventSet.unique_timestamps] | Removes events with duplicated timestamps from an [`EventSet`][temporian.EventSet]. |
| [`EventSet.until_next()`][temporian.EventSet.until_next] | Duration until the next sampling event. |
Expand Down
1 change: 1 addition & 0 deletions docs/src/reference/temporian/operators/tick_calendar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: temporian.EventSet.tick_calendar
142 changes: 141 additions & 1 deletion temporian/core/event_set_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# pylint: disable=import-outside-toplevel

from __future__ import annotations
from typing import Any, Dict, List, Optional, Union, TYPE_CHECKING
from typing import Any, Dict, List, Literal, Optional, Union, TYPE_CHECKING

from temporian.core.data.duration import Duration

Expand Down Expand Up @@ -2423,6 +2423,146 @@ def tick(

return tick(self, interval=interval, align=align)

def tick_calendar(
self: EventSetOrNode,
second: Optional[Union[int, Literal["*"]]] = None,
minute: Optional[Union[int, Literal["*"]]] = None,
hour: Optional[Union[int, Literal["*"]]] = None,
mday: Optional[Union[int, Literal["*"]]] = None,
month: Optional[Union[int, Literal["*"]]] = None,
wday: Optional[Union[int, Literal["*"]]] = None,
) -> EventSetOrNode:
"""Generates events periodically at fixed times or dates e.g. each month.

Events are generated in the range of the input
[`EventSet`][temporian.EventSet] independently for each index.

The usability is inspired in the crontab format, where arguments can
take a value of `'*'` to tick at all values, or a fixed integer to
tick only at that precise value.

Non-specified values (`None`), are set to `'*'` if a finer
resolution argument is specified, or fixed to the first valid value if
a lower resolution is specified. For example, setting only
`tick_calendar(hour='*')`
is equivalent to:
`tick_calendar(second=0, minute=0, hour='*', mday='*', month='*')`
, resulting in one tick at every exact hour of every day/month/year in
the input guide range.

The datetime timezone is always assumed to be UTC.

Examples:
DonBraulio marked this conversation as resolved.
Show resolved Hide resolved
```python
>>> # Every day (at 00:00:00) in the period (exactly one year)
>>> a = tp.event_set(timestamps=["2021-01-01", "2021-12-31 23:59:59"])
>>> b = a.tick_calendar(hour=0)
>>> b
indexes: ...
events:
(365 events):
timestamps: [...]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though for later: It would be interesting for dates to be printed as datetime when "is_unix_time" is set.
Printing the dates directly could make the example simpler (instead of using "calendar_hour" and other similar functions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree.
May implement in future PR because it might require changing further doctests.

...


>>> # Every day at 2:30am
>>> b = a.tick_calendar(hour=2, minute=30)
>>> tp.glue(b.calendar_hour(), b.calendar_minute())
indexes: ...
events:
(365 events):
timestamps: [...]
'calendar_hour': [2 2 2 ... 2 2 2]
'calendar_minute': [30 30 30 ... 30 30 30]
...


>>> # Day 5 of every month (at 00:00)
>>> b = a.tick_calendar(mday=5)
>>> b.calendar_day_of_month()
indexes: ...
events:
(12 events):
timestamps: [...]
'calendar_day_of_month': [5 5 5 ... 5 5 5]
...


>>> # 1st of February of every year
>>> a = tp.event_set(timestamps=["2020-01-01", "2021-12-31"])
>>> b = a.tick_calendar(month=2)
>>> tp.glue(b.calendar_day_of_month(), b.calendar_month())
indexes: ...
events:
(2 events):
timestamps: [...]
'calendar_day_of_month': [1 1]
'calendar_month': [2 2]
...

>>> # Every second in the period (2 hours -> 7200 seconds)
>>> a = tp.event_set(timestamps=["2020-01-01 00:00:00",
... "2020-01-01 01:59:59"])
>>> b = a.tick_calendar(second='*')
>>> b
indexes: ...
events:
(7200 events):
timestamps: [...]
...

>>> # Every second of the minute 30 of every hour (00:30 and 01:30)
>>> a = tp.event_set(timestamps=["2020-01-01 00:00",
... "2020-01-01 02:00"])
>>> b = a.tick_calendar(second='*', minute=30)
>>> b
indexes: ...
events:
(120 events):
timestamps: [...]
...

>>> # Not allowed: intermediate arguments (minute, hour) not specified
DonBraulio marked this conversation as resolved.
Show resolved Hide resolved
>>> b = a.tick_calendar(second=1, mday=1) # ambiguous meaning
Traceback (most recent call last):
...
ValueError: Can't set argument to None because previous and
following arguments were specified. Set to '*' or an integer ...

```

Args:
second: '*' (any second), None (auto) or number in range `[0-59]`
to tick at specific second of each minute.
minute: '*' (any minute), None (auto) or number in range `[0-59]`
to tick at specific minute of each hour.
hour: '*' (any hour), None (auto), or number in range `[0-23]` to
tick at specific hour of each day.
mday: '*' (any day), None (auto) or number in range `[1-31]`
to tick at specific day of each month. Note that months
without some particular day may not have any tick
(e.g: day 31 on February).
month: '*' (any month), None (auto) or number in range `[1-12]` to
tick at one particular month of each year.
wday: '*' (any day), None (auto) or number in range `[0-6]`
(Sun-Sat) to tick at particular day of week. Can only be
specified if `day_of_month` is `None`.

Returns:
A feature-less EventSet with timestamps at specified interval.
"""
from temporian.core.operators.tick_calendar import tick_calendar

return tick_calendar(
self,
second=second,
minute=minute,
hour=hour,
mday=mday,
month=month,
wday=wday,
)

def timestamps(self: EventSetOrNode) -> EventSetOrNode:
"""Converts an [`EventSet`][temporian.EventSet]'s timestamps into a
`float64` feature.
Expand Down
16 changes: 16 additions & 0 deletions temporian/core/operators/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,22 @@ py_library(
],
)

py_library(
name = "tick_calendar",
srcs = ["tick_calendar.py"],
srcs_version = "PY3",
deps = [
":base",
"//temporian/core:compilation",
"//temporian/core:operator_lib",
"//temporian/core:typing",
"//temporian/core/data:dtype",
"//temporian/core/data:node",
"//temporian/proto:core_py_proto",
"//temporian/utils:typecheck",
],
)

py_library(
name = "select_index_values",
srcs = ["select_index_values.py"],
Expand Down
12 changes: 12 additions & 0 deletions temporian/core/operators/test/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,15 @@ py_test(
"//temporian/core/operators:until_next",
],
)

py_test(
name = "tick_calendar_test",
srcs = ["tick_calendar_test.py"],
srcs_version = "PY3",
deps = [
# already_there/absl/testing:absltest
"//temporian/core/data:dtype",
"//temporian/core/data:node",
"//temporian/core/operators:tick_calendar",
],
)
154 changes: 154 additions & 0 deletions temporian/core/operators/test/tick_calendar_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Copyright 2021 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from absl.testing import absltest

from temporian.core.data.node import input_node
from temporian.core.operators.tick_calendar import tick_calendar, TickCalendar


class TickCalendarOperatorTest(absltest.TestCase):
def setUp(self):
self._in = input_node([], is_unix_timestamp=True)

def test_free_seconds_month(self):
output = tick_calendar(self._in, second="*", minute=1, hour=1, mday=31)
op = output.creator
assert isinstance(op, TickCalendar)
self.assertEqual(op.second, "*")
self.assertEqual(op.minute, 1)
self.assertEqual(op.hour, 1)
self.assertEqual(op.mday, 31)
self.assertEqual(op.month, "*")
self.assertEqual(op.wday, "*")

def test_free_minutes(self):
output = tick_calendar(self._in, minute="*")
op = output.creator
assert isinstance(op, TickCalendar)
self.assertEqual(op.second, 0)
self.assertEqual(op.minute, "*")
self.assertEqual(op.hour, "*")
self.assertEqual(op.mday, "*")
self.assertEqual(op.month, "*")
self.assertEqual(op.wday, "*")

def test_month_day(self):
output = tick_calendar(self._in, mday=5)
op = output.creator
assert isinstance(op, TickCalendar)
self.assertEqual(op.second, 0)
self.assertEqual(op.minute, 0)
self.assertEqual(op.hour, 0)
self.assertEqual(op.mday, 5)
self.assertEqual(op.month, "*")
self.assertEqual(op.wday, "*")

def test_month(self):
output = tick_calendar(self._in, month=8)
op = output.creator
assert isinstance(op, TickCalendar)
self.assertEqual(op.second, 0)
self.assertEqual(op.minute, 0)
self.assertEqual(op.hour, 0)
self.assertEqual(op.mday, 1)
self.assertEqual(op.month, 8)
self.assertEqual(op.wday, "*")

def test_weekdays(self):
output = tick_calendar(self._in, wday=6)
op = output.creator
assert isinstance(op, TickCalendar)
self.assertEqual(op.second, 0)
self.assertEqual(op.minute, 0)
self.assertEqual(op.hour, 0)
self.assertEqual(op.mday, "*")
self.assertEqual(op.month, "*")
self.assertEqual(op.wday, 6)

def test_weekdays_month(self):
output = tick_calendar(self._in, wday=6, month=3)
op = output.creator
assert isinstance(op, TickCalendar)
self.assertEqual(op.second, 0)
self.assertEqual(op.minute, 0)
self.assertEqual(op.hour, 0)
self.assertEqual(op.mday, "*")
self.assertEqual(op.month, 3)
self.assertEqual(op.wday, 6)

def test_weekdays_all_hours(self):
output = tick_calendar(self._in, wday=6, hour="*")
op = output.creator
assert isinstance(op, TickCalendar)
self.assertEqual(op.second, 0)
self.assertEqual(op.minute, 0)
self.assertEqual(op.hour, "*")
self.assertEqual(op.mday, "*")
self.assertEqual(op.month, "*")
self.assertEqual(op.wday, 6)

def test_invalid_ranges(self):
for kwargs in (
{"second": -1},
{"second": 60},
{"minute": -1},
{"minute": 60},
{"hour": -1},
{"hour": 24},
{"mday": 0},
{"mday": 32},
{"mday": -1}, # may be supported in the future
{"month": -1},
{"month": 13},
{"wday": -1},
{"wday": 7},
):
with self.assertRaisesRegex(
ValueError, "Value should be '\*' or integer in range"
):
_ = tick_calendar(self._in, **kwargs) # type: ignore

def test_invalid_types(self):
for kwargs in (
{"second": "1"},
{"minute": "00"},
{"hour": "00:00"},
{"month": "January"},
{"wday": "Sat"},
):
with self.assertRaisesRegex(ValueError, "Non matching type"):
_ = tick_calendar(self._in, **kwargs) # type: ignore

def test_undefined_args(self):
with self.assertRaisesRegex(
ValueError,
"Can't set argument to None because previous and following",
):
_ = tick_calendar(self._in, second=1, hour=1) # undefined min

with self.assertRaisesRegex(
ValueError,
"Can't set argument to None because previous and following",
):
_ = tick_calendar(self._in, second=1, month=1)

with self.assertRaisesRegex(
ValueError,
"Can't set argument to None because previous and following",
):
_ = tick_calendar(self._in, hour=0, month=1)


if __name__ == "__main__":
absltest.main()
Loading