Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/operand error with encoders #2034

Merged
merged 4 commits into from
Oct 28, 2023
Merged

Conversation

madtoinou
Copy link
Collaborator

Fixes #1875, fixes #1991

Summary

When encoders are used to generate covariates, they have the minimum time requirements. In tabularization, an arithmetic operation on Timedelta and pandas.offset must be performed to realign the covariate and target time indexes. However, some frequencies ('M', 'Y' and 'y') conversion to Timedelta are ambiguous (pandas doc), causing the unsupported operand error.

To solve the problem for these specific cases, a temporary DatetimeIndex is created and the information is extracted without relying on the conversion (slower than the arithmetic operation).

…ts a ambiguous timedelta value to extract the start time index
@codecov-commenter
Copy link

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Files Coverage Δ
darts/utils/data/tabularization.py 98.82% <66.66%> (-0.29%) ⬇️

... and 6 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks a lot @madtoinou.
Just had a minor suggestion and that we should add a test for it

start_time_idx = (
len(
pd.date_range(
start=time_index_i[0],
Copy link
Collaborator

@dennisbader dennisbader Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more efficient to generate the index from the end of the series instead of from the beginning and then just add the len(time_index_i) to it?

Also we could use our darts.utils.timeseries_generation.generate_index for that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I don't know if this is much faster but at least, it looks similar to the other case

darts/utils/data/tabularization.py Show resolved Hide resolved
Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks great thanks @madtoinou 🚀

@@ -1132,37 +1132,44 @@ def test_lagged_training_data_extend_past_and_future_covariates_range_idx(self):
assert np.allclose(expected_X, X[:, :, 0])
assert np.allclose(expected_y, y[:, :, 0])

def test_lagged_training_data_extend_past_and_future_covariates_datetime_idx(self):
@pytest.mark.parametrize("freq", ["D", "MS", "Y"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍

@dennisbader dennisbader merged commit e6f2208 into master Oct 28, 2023
@dennisbader dennisbader deleted the fix/encoders_operand_error branch October 28, 2023 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Released
Development

Successfully merging this pull request may close these issues.

[BUG] Error with XGBModel and Encoders [BUG] RegressionModel historical forecasts with specific encoder lags
3 participants