Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new primitive: Arima model #168

Closed
AlexanderGeiger opened this issue Apr 26, 2019 · 3 comments · Fixed by #173
Closed

Add new primitive: Arima model #168

AlexanderGeiger opened this issue Apr 26, 2019 · 3 comments · Fixed by #173
Assignees
Labels
approved The issue is approved and someone can start working on it new primitives A new primitive is being requested
Milestone

Comments

@AlexanderGeiger
Copy link
Contributor

Description

ARIMA models are often used to describe time series data. Therefore we should add an Arima primitive for time series forecasting. We can use the statsmodels library.

The primitive takes an array as input, on which an Arima model is fitted. The forecast method returns another array of predictions.

What I Did

I started implementing this primitive for testing purposes in the arima branch on my fork, which you can check out.
Concretely, I included an adapter for statsmodels and a Primitive JSON file.

Any feedback on the primitive itself and the implementation would be highly appreciated.

@csala csala added the new primitives A new primitive is being requested label Apr 30, 2019
@csala
Copy link
Contributor

csala commented Apr 30, 2019

Thanks for opening the issue @AlexanderGeiger, but I have some concerns about the proposed approach and implementation.

My understanding of the ARIMA model is that it is fitted on a single sequence of datapoints (a single time series) and then it is used to forecast one or more values immediately after that sequence which it has been fitted to.

On the contrary, models like an LSTM based regressor are fitted on both a collection of sequences of data points and the values after them and later on they can be used to forecast the values after any new sequence of data points that they are given.

In order words, even though the ARIMA class has a fit method, I think that it does not really have a learning stage equivalent to the one that models like LSTM have and that the ARIMA adapter should be a stateless function (or method) instead of a class with both a fit and a predict method.

So, when it comes to the exact implementation, I think the class prototype should be something like this:


from statsmodels.tsa import arima_model


class ARIMA:

    def __init__(self, ...):
        # Capture here all the arguments that are actually hyperparameters
        # and store them as class attributes.

    def forecast(self, X):
        ...  # manipulate X as necessary here
        arima = arima_model.ARIMA(X, ...)
        arima_results = arima.fit(...)
        y = arima_results.forecast(...)
        ... # manipulate y as necessary here
        return y

then, at the JSON level, you would specify only the predict method:

{
    ...
    "name": "statsmodels.tsa.arima_model.ARIMA",
    "primitive": "mlprimitives.adapters.statsmodels.ARIMA",
    "produce": {
        "method": "forecast",
        ...
    },
    ...
}

Also note the name of the class: when writing an adapter, and as far as it possible, make the name of the class or function be exactly the same as the one that you are adapting. So, in this case, the adapter class should be called ARIMA instead of arima_model

@csala csala added the Pending Review The bug is not confirmed or the feature request is being considered label Apr 30, 2019
@AlexanderGeiger
Copy link
Contributor Author

Hi @csala , Thank you for the feedback!
I changed the structure of the Arima primitive accordingly. The primitive now takes an array of time series data as input and produces another array of predictions. The number of predictions is defined in the steps hyperparameter. The json can be found here and the Arima class here. Please let me know if that is the approach you had in mind.

@csala
Copy link
Contributor

csala commented May 13, 2019

Hi @AlexanderGeiger Yes! Now it looks exactly as I had thought about it. :-)

It would be great to also add a pipeline that uses it, but I'm afraid there is no timeseries dataset yet in the repo, so perhaps you can create the PR without the pipeline and we can add it later when we add the dataset.

@csala csala added approved The issue is approved and someone can start working on it and removed Pending Review The bug is not confirmed or the feature request is being considered labels May 22, 2019
AlexanderGeiger added a commit to AlexanderGeiger/MLPrimitives that referenced this issue Aug 12, 2019
@csala csala added this to the 0.2.1 milestone Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved The issue is approved and someone can start working on it new primitives A new primitive is being requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants