-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/operand error with encoders #2034
Conversation
…ts a ambiguous timedelta value to extract the start time index
Codecov ReportAttention: ❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
... and 6 files with indirect coverage changes 📢 Thoughts on this report? Let us know!. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks a lot @madtoinou.
Just had a minor suggestion and that we should add a test for it
darts/utils/data/tabularization.py
Outdated
start_time_idx = ( | ||
len( | ||
pd.date_range( | ||
start=time_index_i[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more efficient to generate the index from the end of the series instead of from the beginning and then just add the len(time_index_i) to it?
Also we could use our darts.utils.timeseries_generation.generate_index
for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I don't know if this is much faster but at least, it looks similar to the other case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks great thanks @madtoinou 🚀
@@ -1132,37 +1132,44 @@ def test_lagged_training_data_extend_past_and_future_covariates_range_idx(self): | |||
assert np.allclose(expected_X, X[:, :, 0]) | |||
assert np.allclose(expected_y, y[:, :, 0]) | |||
|
|||
def test_lagged_training_data_extend_past_and_future_covariates_datetime_idx(self): | |||
@pytest.mark.parametrize("freq", ["D", "MS", "Y"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 👍
Fixes #1875, fixes #1991
Summary
When encoders are used to generate covariates, they have the minimum time requirements. In tabularization, an arithmetic operation on
Timedelta
andpandas.offset
must be performed to realign the covariate and target time indexes. However, some frequencies ('M', 'Y' and 'y'
) conversion toTimedelta
are ambiguous (pandas doc), causing the unsupported operand error.To solve the problem for these specific cases, a temporary
DatetimeIndex
is created and the information is extracted without relying on the conversion (slower than the arithmetic operation).