Add support for models with prediction output size above 1. #323

jonlachmann · 2022-12-12T11:30:59Z

This adds support for models that have predict functions with size over 1. For example a time-series forecast model which forecasts 3 steps ahead. In general, any model which has multiple outputs in its predict function should now be supported.

After discussion with Martin I have made it so that there are no new arguments to the explain function, rather the input "prediction_zero" must be of the same size as the outputs from the predict function of the model.

…d with the empirical approach.

…nd data.frames.

martinju · 2023-01-04T22:28:22Z

I renamed parallel to use_future as it better reflects the option.
Note: The failing setup tests are expected due to the changed behavior. Don't worry about that, we'll handle that in the end.

Other things we need to do:

We need new tests for the multiple output situation. I'm thinking of a single test in test-output using a basic arima model fitted with stats::arima(), then a maybe two tests under test-setup to a) check that we get the right dimension out of explain(), and b) that the method fails we pass a prediction function with multiple outcomes while providing a single prediction_zero.
Add a basic example to the vignette on how to use the multiple output module

Things we need to discuss/think through:

While using prediction_zero to tell explain() that the multiple output module is in use is nice, I was originally thinking of changing the behavior of get_predict_model() to extract this dimension directly from the "test" of predict_model. The downside of my original idea is that we then miss this test of the output of predict_model being of the right dimension. Let's just think a bit about this.
I see that you require the output of predict_model to be a data.frame in the multiple outcome situation. Isn't it more common to output a standard matrix in such cases?
How should plotting work for such multiple output models? The easiest is probably to just let the user specify which of the predictions that should be plotted (and default to the first one with a note written to the console). It might be nice to have them all in the same plot, however.
Should we also handle multi-class classification in the same setting? If so, an example of this should also be put into the vignette.

jonlachmann · 2023-01-04T22:43:45Z

I renamed parallel to use_future as it better reflects the option. Note: The failing setup tests are expected due to the changed behavior. Don't worry about that, we'll handle that in the end.

Other things we need to do:
* [ ]  We need new tests for the multiple output situation. I'm thinking of a single test in test-output using a basic arima model fitted with stats::arima(), then a maybe two tests under test-setup to a) check that we get the right dimension out of explain(), and b) that the method fails we pass a prediction function with multiple outcomes while providing a single prediction_zero.

* [ ]  Add a basic example to the vignette on how to use the multiple output module

I can get started on some tests tomorrow. Do you want it to explain say 3 lags, or some exogenous variables? Once we have a test, an example is quite easy since it can just use the same code. One idea is that a basic VAR model with for example some basic weather data may be both interesting and provide something intuitive for the example in the vignette. It could also showcase the grouping feature if we want to, to group lags for the same variable.

Things we need to discuss/think through:

* [ ]  While using `prediction_zero` to tell `explain()` that the multiple output module is in use is nice, I was originally thinking of changing the behavior of `get_predict_model()` to extract this dimension directly from the "test" of predict_model. The downside of my original idea is that we then miss this test of the output of predict_model being of the right dimension. Let's just think a bit about this.

I know that you are against having too many inputs in explain, but I do think that an explicit argument may be more clear. Another option is of course to have a wrapper function called say explain_multiple... Not sure about this, but just putting it out there for consideration.

* [ ]  I see that you require the output of predict_model to be a data.frame in the multiple outcome situation. Isn't it more common to output a standard matrix in such cases?

A matrix would be preferred, but here my knowledge of data.table ended... It seemed that it did not really work to have a matrix mapped to it correctly. I am sure that it should be possible somehow, and it is definitely preferred.

* [ ]  How should plotting work for such multiple output models? The easiest is probably to just let the user specify which of the predictions that should be plotted (and default to the first one with a note written to the console). It might be nice to have them all in the same plot, however.

This is something that is also interesting for our application of it to forecasting. I am not even sure how such a plot would look, but I imagine vertical stacked barplots somehow.

* [ ]  Should we also handle multi-class classification in the same setting? If so, an example of this should also be put into the vignette.

That would probably be useful, the code should work the same way I think. It would require a good example to make it intuitive. I will try to think of something.

Added multiple output test using the ar model and temperature data.

jonlachmann · 2023-01-05T13:50:52Z

I have now added an ar model (using the arima model from stats to make predictions based on a specific vector without the forecast package is a pain), which has a test using the temperature data used in the other tests. Please have a look if it looks acceptable.

martinju · 2023-01-06T10:08:41Z

Thanks for all this!

Do you want it to explain say 3 lags, or some exogenous variables? Once we have a test, an example is quite easy since it can just use the same code. One idea is that a basic VAR model with for example some basic weather data may be both interesting and provide something intuitive for the example in the vignette. It could also showcase the grouping feature if we want to, to group lags for the same variable.

I have looked into your example and it works well, but I also started to play around with the idea of explaining time series lags at the same time as exogenous variables. That would be very helpful in practice, I believe. I'll will play around a bit more and let you know when I got something.

I know that you are against having too many inputs in explain, but I do think that an explicit argument may be more clear. Another option is of course to have a wrapper function called say explain_multiple... Not sure about this, but just putting it out there for consideration.

I actually think that is a good idea to separate the multiple output into a separate function. If we also distinguish between say forecasting different lags, and multiple outcome classification, we could make the former more user friendly by formatting the data for the user (i.e. not require the user to provide all lags time series). Something to think about.

A matrix would be preferred, but here my knowledge of data.table ended... It seemed that it did not really work to have a matrix mapped to it correctly. I am sure that it should be possible somehow, and it is definitely preferred.

No, problem, I can deal with this.

That would probably be useful, the code should work the same way I think. It would require a good example to make it intuitive. I will try to think of something.

Great!

jonlachmann · 2025-01-20T13:52:04Z

Can this be closed? Seems we did all this in other PRs, or is there something still useful here?

jonlachmann and others added 10 commits November 29, 2022 11:51

Expose an option to run the procedure sequentially in explain.

5a86d42

Implemented flexible output size of the model predict function. Teste…

cf36201

…d with the empirical approach.

Adjust the check for model output to work with both numeric vectors a…

4a2f16b

…nd data.frames.

Update documentation.

6a09244

Merge remote-tracking branch 'origin/devel' into output_size

f3d0c06

man

287eeec

dt_mat output is vector instead of list

eb7c64f

Set explain to run in parallel as default.

520c18b

Update docs to conform to R CMD check.

ee950d3

rename parallel variable

96d1e4c

Added ar model definition.

00a1410

Added multiple output test using the ar model and temperature data.

wbound90 mentioned this pull request Jun 25, 2024

Shap loss values #347

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for models with prediction output size above 1. #323

Add support for models with prediction output size above 1. #323

jonlachmann commented Dec 12, 2022

martinju commented Jan 4, 2023

jonlachmann commented Jan 4, 2023

jonlachmann commented Jan 5, 2023

martinju commented Jan 6, 2023

jonlachmann commented Jan 20, 2025

Add support for models with prediction output size above 1. #323

Are you sure you want to change the base?

Add support for models with prediction output size above 1. #323

Conversation

jonlachmann commented Dec 12, 2022

martinju commented Jan 4, 2023

jonlachmann commented Jan 4, 2023

jonlachmann commented Jan 5, 2023

martinju commented Jan 6, 2023

jonlachmann commented Jan 20, 2025