Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

breaking: raise error for gaps in series #504

Merged
merged 19 commits into from
Oct 31, 2024
Merged

breaking: raise error for gaps in series #504

merged 19 commits into from
Oct 31, 2024

Conversation

jmoralez
Copy link
Member

@jmoralez jmoralez commented Oct 22, 2024

Raises an error when the series' timestamps have gaps in them, are duplicated or don't match the specified frequency (freq argument).

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

github-actions bot commented Oct 22, 2024

Experiment Results

Experiment 1: air-passengers

Description:

variable experiment
h 12
season_length 12
freq MS
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 12.6793 11.0623 47.8333 76
mape 0.027 0.0232 0.0999 0.1425
mse 213.936 199.132 2571.33 10604.2
total_time 9.1022 2.2177 0.0059 0.0047

Plot:

Experiment 2: air-passengers

Description:

variable experiment
h 24
season_length 12
freq MS
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 58.1031 58.4587 71.25 115.25
mape 0.1257 0.1267 0.1552 0.2358
mse 4040.21 4110.79 5928.17 18859.2
total_time 0.6477 1.2845 0.0045 0.0042

Plot:

Experiment 3: electricity-multiple-series

Description:

variable experiment
h 24
season_length 24
freq H
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 178.293 268.13 269.23 1331.02
mape 0.0234 0.0311 0.0304 0.1692
mse 121589 219485 213677 4.68961e+06
total_time 0.4795 2.1672 0.0055 0.0053

Plot:

Experiment 4: electricity-multiple-series

Description:

variable experiment
h 168
season_length 24
freq H
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 465.497 346.984 398.956 1119.26
mape 0.062 0.0437 0.0512 0.1583
mse 835021 403787 656723 3.17316e+06
total_time 0.6089 1.3617 0.006 0.0055

Plot:

Experiment 5: electricity-multiple-series

Description:

variable experiment
h 336
season_length 24
freq H
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 558.673 459.757 602.926 1340.95
mape 0.0697 0.0565 0.0787 0.17
mse 1.22723e+06 739114 1.61572e+06 6.04619e+06
total_time 2.3025 2.4547 0.0058 0.0054

Plot:

@jmoralez jmoralez changed the title enh: raise error for gaps in series breaking: raise error for gaps in series Oct 23, 2024
@jmoralez jmoralez marked this pull request as ready for review October 23, 2024 19:22
@jmoralez jmoralez requested a review from AzulGarza October 23, 2024 19:22
Copy link
Contributor

@elephaint elephaint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments, for the comment about the addition to the notebooks in the capabilities, that one might also be appropriate in the tutorial 12_irregular

nixtla/nixtla_client.py Show resolved Hide resolved
"\n",
"# Forecast\n",
"# We use B for the freq, as only business days are represented in the dataset\n",
"forecast_df = nixtla_client.forecast(\n",
" df=df, \n",
" df=df,\n",
Copy link
Contributor

@elephaint elephaint Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a callout to this capabilities notebook at the bottom stating that TimeGPT doesn't allow gaps in the timestamps? E.g.

"Make sure there are no gaps in your time series data. This means that even if the chosen frequency is irregular, you should still make sure you provide a value for every irregular timestamp in the data. For example, if your frequency is "B" (business day), there can't be a gap (missing datapoint) between two consecutive business days."

Edit: perhaps add a similar comment also to the beginning or end of the tutorial notebook 12_irregular.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have that in the data requirements notebook

When using TimeGPT, the data cannot contain missing values. This means that for every series, there should be no gaps in the timestamps and no missing values in the target variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but repetition is the key to education? 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that may help, but I'd prefer to add a link to that section instead, otherwise we'll have to remember to change that in every place we set it and will most likely miss some.

@jmoralez jmoralez requested a review from elephaint October 29, 2024 18:26
Copy link
Contributor

@elephaint elephaint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small non-blocking comments

@jmoralez jmoralez merged commit 8b0660a into main Oct 31, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants