Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/four theta #123

Merged
merged 49 commits into from
Jul 20, 2020
Merged

Feature/four theta #123

merged 49 commits into from
Jul 20, 2020

Conversation

Droxef
Copy link
Contributor

@Droxef Droxef commented Jul 3, 2020

Implement 4Theta method from M4 competition

Summary

Naive implementation of the 4Theta method from the organizers of M4 competition.
Modify backtesting gridsearch to let an automatic search of the best hyper-parameters

@Droxef Droxef requested review from hrzn and TheMP as code owners July 3, 2020 12:49

# Linear Regression part of the decomposition. We select the degree one coefficient.
b_theta = np.polyfit(np.array([i for i in range(0, self.length)]), (1.0 - self.theta) * new_ts.values(), 1)[0]

# Normalization of the coefficient b_theta.
self.coef = b_theta / (2.0 - self.theta)
self.coef = b_theta / (2.0 - self.theta) # change to b_theta / (-self.theta) if classical theta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you manage to figure out why is the formula written like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mathematically, I'm still not quite sure about the rest, but I am sure it is correct for theta = 0 and 1.
And the definition of the theta lines are not quite the same, thus the confusing theta.

But comparing with the classical theta (4theta with default parameter), we have the exact same results if we change the normalization by -1/theta. (because there is a symmetry: 2-theta_1 = theta_2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we unify the meaning of theta? E.g. by changing to b_theta / (-self.theta) and adapting the default value to theta=2 ?

linreg = new_ts.values()
elif self.trend_mode == 'exponential':
linreg = np.log(new_ts.values())
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be less code and more coherent with other implementations if we check for either linear or exponential with assert and throw an exception if none of the values is selected.

theta_t = self.theta * new_ts.values() + (1 - self.theta) * theta0_in
elif self.model_mode == 'multiplicative' and (theta0_in > 0).all():
theta_t = (new_ts.values() ** self.theta) * (theta0_in ** (1 - self.theta))
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as before, we can assert at the beginning and throw an exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a first draft, I blindly replicated the step of the original alorithm. So I will change it.
But we still need to check if all values are positive if 'multiplicative' models are chosen or else it will fail.
I can either raise an error, or simply keep the fallback as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a more optimized implementation, like our theta. But we need to do the math to find it


replicated_seasonality = np.tile(self.seasonality.pd_series()[-self.season_period:],
math.ceil(n / self.season_period))[:n]
if self.season_mode in ['multiplicative', 'mul']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for simplicity I would also pick one of them, and also an ENUM that we pass here might be helpful to avoid typos https://docs.python.org/3/library/enum.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be sure I understand correctly, the different model arguments will accept an Enum value?
I can change the theta method as well in the same manner. Thus Enum classes will be declared exterior to the class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this comment actually applies to both theta methods - instead of retyping "multiplicative"/"additive" string we can have an enum that would have 2 values - MULTIPLICATIVE or ADDITIVE and if you make a typo the IDE will tell you something is not matching (not the case for raw strings)

@@ -313,15 +313,16 @@ def backtest_gridsearch(model_class: type,

Parameters
----------
model
model_class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I let this slip through, thanks!

elif val_series == 'train':
model.fit(train_series)
# Use ndarray because casting to TimeSeries takes too much time
error = metric(model.fitted_values, train_series.univariate_values())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, FourTheta is the only model that possesses the attribute fitted_values. So it would be good to have a check here to test whether the given model supports this functionality, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least all ExponentialSmoothing have a fitted attribute that can be retrieved (So the current theta model could have one too, I think). If you approve this functionality, it might be interesting to add this attribute in other models.
But yes, a check is necessary.

@@ -88,13 +89,13 @@ def fit(self, series: TimeSeries, component_index: Optional[int] = None):
new_ts = remove_seasonality(ts, self.season_period, model=self.mode)

# SES part of the decomposition.
self.model = hw.SimpleExpSmoothing(new_ts.values()).fit()
self.model = hw.SimpleExpSmoothing(new_ts.values()).fit(initial_level=0.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the effect of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with SES in statsmodels is that alpha is not constrained between [0.1, 0.99] as it should, thus giving NaN values when alpha is 0.
Setting the initial_level to 0.2 (or anything else) avoided all encountered cases where the optimization gave alpha=0.
It is more a hotfix rather than an actual solution before statsmodels corrects the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it leads to different results. I will add it only in the case alpha is 0, and it will recompute it

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got few requested changes, but overall it looks good @Droxef
Thanks for the work!

@@ -364,17 +365,23 @@ def backtest_gridsearch(model_class: type,
For every hyperparameter combination, the model is trained on `train_series` and
evaluated on `val_series`.

Comparison with fitted values (activated when `use_fitted_values` is passed):
For every hyperparameter combination, the model is trained on `train_series` and the resulting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small proposed modification:

For every hyperparameter combination, the model is trained on `train_series` and evaluated on the resulting
fitted values. Not all models have fitted values, and this method raises an error if `model.fitted_values` doesn't exist.
The fitted values are the result of the fit of the model on the training series. Comparing with the fitted values
can be a quick way to assess the model, but one cannot see if the model overfits or underfits.

@@ -389,6 +396,9 @@ def backtest_gridsearch(model_class: type,
as argument to the predict method of `model`.
num_predictions:
The number of train/prediction cycles performed in one iteration of expanding window mode.
use_fitted_values
If `True`, it will activates the comparison with the fitted values, if `fitted_values` is an attribute of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If True, uses the comparison with the fitted values. Raises an error if fitted_values is not an attribute of model_class.

raise_if_not(train_series.width == val_series.width, "Training and validation series require the same"
" number of components.", logger)

raise_if_not((fcast_horizon_n is None) ^ (val_series is None),
"Please pass exactly one of the arguments 'forecast_horizon_n' or 'val_series'.", logger)
raise_if_not(bool((fcast_horizon_n is None) ^ (val_series is None) ^ use_fitted_values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit hard to read. How about

raise_if_not(((fcast_horizon_n is not None) + (val_series is not None) + use_fitted_values) == 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I would perform this check at the top of the method.


fit_kwargs, predict_kwargs = _create_parameter_dicts(model_class(), target_indices, component_index,
use_full_output_length)

if val_series is None:
if (val_series is None) & (not use_fitted_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use logical operators (and) instead of binary ones when possible?

from enum import Enum


class Season(Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call these SeasonalityMode, TrendMode and ModelMode ? wdyt?

self.fitted_values *= self.mean
# Takes too much time to create a TimeSeries
# Overhead: 30% ± 10 (2-10 ms in average)
self.fitted_values = TimeSeries.from_times_and_values(ts.time_index(), self.fitted_values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could build the TimeSeries only in the backtesting method, this way the time overhead is paid only for backtesting and not for simply fitting the model.

@@ -37,6 +36,8 @@ class AutoregressionModelsTestCase(unittest.TestCase):
(Theta(), 11.3),
(Theta(1), 20.2),
(Theta(3), 9.8),
(FourTheta(1), 20.2),
(FourTheta(-1), 9.8),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add testing for a few more modes?

def select_best_model(ts: TimeSeries, thetas: Optional[List[int]] = None,
m: Optional[int] = None, normalization: bool = True) -> 'FourTheta':
"""
Performs a grid search over all hyper parameters to select the best model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mention that it is using the fitted values on the training series here.

model.fit(train_series)
best_model.fit(train_series)
forecast_random = model.predict(10)
forecast_best = model.predict(10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
forecast_best = model.predict(10)
forecast_best = best_model.predict(10)

return self._build_forecast_series(forecast)

@staticmethod
def select_best_model(ts: TimeSeries, thetas: Optional[List[int]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep it temporarily, but then eventually I think we should have each model (if possible) given good reasonable default hyper parameters sets, and then let the user call backtest_gridsearch(params='default', ...) or something like this.
Ping @grll

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Droxef, nice addition!

Type of seasonality. Either "additive" or "multiplicative".
season_mode
Type of seasonality.
Either SeasonalityMode.MULTIPLICATIVE, SeasonalityMode.ADDITIVE or SeasonalityMode.NONE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

or an inferred seasonality period.

When called with `theta = 2 - X`, `model_mode = Model.ADDITIVE` and `trend_mode = Trend.LINEAR`,
this model is equivalent to calling `Theta(theta=X)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved after our discussion - The original Theta implementation is faster.

# will lead to fitted_values similar to ts. But one cannot see if it overfits.
if self.normalization:
self.fitted_values *= self.mean
# Takes too much time to create a TimeSeries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can now remove this comment (or move to the corresponding backtesting function)

@Droxef Droxef merged commit 55e4e42 into develop Jul 20, 2020
@Droxef Droxef deleted the feat/FourTheta branch July 20, 2020 07:24
@Droxef Droxef mentioned this pull request Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants