[ENH] Change effects api, and use BaseObject from scikit-base #85

felipeangelimvieira · 2024-07-11T13:39:19Z

This pull request changes how effects are used in forecasters and how custom effects can be applied.

Objectives

The main objectives are:

Make the effects API more flexible so that users can customize the data preparation step for the effect. For example, it may be interesting to use seasonality features not only in the seasonal component but also in other effects. Additionally, this opens the door to using A/B test results to calibrate model outputs in Marketing Mix Modelling (see this doc from PyMC Marketing).
Use scikit-base to facilitate hyperparameter tuning with sktime classes and to ease the extension of functionality in this package by users.

The BaseEffect class has three arguments:

id : str (default=""): used to identify the effect and its parameters (it is added as prefix to their names during numpyro.sample).
regex : str (default=None): used to detect which columns should be filtered and passed to _apply method.
effect_mode : str (default="multiplicative"): if the effect should be multiplied or not by the trend after _apply.

Children should implement optionally _initializeand _prepare_input_data. The default behaviour of those methods is filtering columns of the exogenous dataframe X according to self.regex, and passing them to _apply.

The _apply method, on the other hand, must be implemented by children classes, and return a jnp.ndarray with the computed effect.

Closes #73

codecov · 2024-07-11T13:51:31Z

Codecov Report

Attention: Patch coverage is 94.16058% with 16 lines in your changes missing coverage. Please review.

Project coverage is 93.35%. Comparing base (14ad310) to head (914a7e0).
Report is 1 commits behind head on main.

Files	Patch %	Lines
src/prophetverse/sktime/base.py	86.44%	8 Missing ⚠️
src/prophetverse/effects/base.py	93.22%	4 Missing ⚠️
src/prophetverse/effects/fourier.py	91.42%	3 Missing ⚠️
src/prophetverse/trend/base.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
+ Coverage   93.29%   93.35%   +0.05%     
==========================================
  Files          25       26       +1     
  Lines        1044     1173     +129     
==========================================
+ Hits          974     1095     +121     
- Misses         70       78       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fkiraly

Interesting! Interesting design, too! I will leave some comments below.

base class design

makes sense to me!

Questions:

is it sensible to attach an id to each effect (the instance), rather than attach the names only in the model? I have seen a number of designs where the name is intrinsic to the estimtaor instance. sklearn avoids that and instead assigns names extrinsically, in compositors such as pipelines. sktime further assigns default names in compositors, e.g., just the class name if there are no duplicates of effects.
naming of methods, see below. I spot similarities to common types of methods, rename for consistency?

names of the abstract methods

I think these might map onto a generic transformer-with-parameter-estimator interface. If I were making calls on how to consistently name the methods, I would rename:

initialize -> fit
apply -> transform?

deprecation safety

I notice some classes introduce parameters at the start or reorder them. This breaks previous positional calls, so if you want that without warning to the users, it should be a conscious choice.

felipeangelimvieira · 2024-07-12T11:58:10Z

Is it sensible to attach an ID to each effect instance, rather than only using names in the model? I have seen several designs where the name is intrinsic to the estimator instance. Scikit-learn avoids that by assigning names extrinsically in compositors such as pipelines. Sktime further assigns default names in compositors, e.g., just the class name if there are no duplicates of effects.

I was using the ID to avoid conflicts of sample site names, but I had forgotten that it is possible to use NumPyro scopes in the model function and completely avoid the need of passing it to the effect instance! Then, it would be possible to pass effects similarly to how it is done in sktime and scikit-learn pipelines:

exogenous_effects = [ 
  ("seasonality", LinearEffect(regex=starts_with(["sin", "cos"])),
   ...
]

I think these might map onto a generic transformer-with-parameter-estimator interface. If I were making calls on how to consistently name the methods, I would rename:

initialize -> fit
apply -> transform?

Concerning the names... At first, I chose to avoid fit and transform because the method signatures are not so similar to what we see in sktime / scikit-learn, and I thought it might confuse users to name them the same way. But maybe that's not a problem; having a good extension template and documentation may help with that. The effects are kind of transformers and estimators at the same time, as they receive the X dataframe and perform transformations to return a dictionary of JAX arrays, which are then used to predict the effect output. So maybe renaming the methods as:

initialize -> fit
prepare_input_data -> transform
apply -> predict

would be a good choice. I believe that using fit, transform, and predict also helps users understand when each one of them is called.

One thing I am questioning is whether the regex argument should be avoided or at least not be in BaseEffect. Instead, having a mixin or another base class might be a better design. This forces all effects to behave like ColumnTransformers. I would love to hear your suggestions on this, @fkiraly.

I will update the PR today with these changes and also show an example of a composite effect that can be used in MMM applications to leverage the results of A/B tests and use them as a reference for an effect output.

felipeangelimvieira · 2024-07-12T11:58:28Z

FYI @felipeffm

…iveEffect

felipeangelimvieira · 2024-07-15T01:10:09Z

So... I have an interface that I feel more comfortable with. Both "id" and "regex" are not passed to effect objects anymore. The exogenous_effects parameter of Prophetverse now is a list of tuples (str, BaseEffect, str | None) that describes the identifier of the effect, the effect object and an optional regex to identify columns of X that should be passed to the effect. If None, no columns are passed.

Children of BaseEffect class now may override _fit, _transform or _predict to create custom behaviours.

Their responsibilities are:

_fit: Initialize any necessary parameters.
_transform: return a dictionary of jnp.ndarray to be passed as named args to predict
_predict: compute the effect value during model training and evaluation.

The default behaviours are:

_fit: do nothing;
_transform: convert the X dataframe to a jax ndarray.
_predict: raise NotImplementedError

A stage string ("train" or "predict") is passed as argument to transform to customize the behaviour when needed. Most of the times this argument won't be needed, except when the effect add an extra likelihood to the model, as the one in "LiftExperimentLikelihood` effect.

Example:

from prophetverse.sktime import Prophetverse
from prophetverse.effects.linear import LinearEffect
from prophetverse.utils import no_input_columns # Alias for None, could also a be a regex that matches nothing
from prophetverse.effects.fourier import LinearFourierSeasonality
from prophetverse.effects.log import LogEffect
from prophetverse.utils.regex import starts_with
import numpyro


exogenous_effects = [
    (
        "seasonality",
        LinearFourierSeasonality(
            freq="D",
            sp_list=[7, 365.25],
            fourier_terms_list=[3, 10],
            prior_scale=0.1,
            effect_mode="multiplicative",
        ),
        no_input_columns,
    ),
    (
        "exog",
        LogEffect(
            rate_prior=dist.Gamma(2, 1),
            scale_prior=dist.Gamma(2, 1),
            effect_mode="additive",
        ),
        starts_with("exog"),
    ),
]

model = Prophetverse(
    trend="linear",
    changepoint_interval=300,
    changepoint_prior_scale=0.0001,
    exogenous_effects=exogenous_effects,
    noise_scale=0.05,
    optimizer_steps=50000,
    optimizer_name="Adam",
    optimizer_kwargs={"step_size": 0.0001},
    inference_method="map",
)
model.fit(y=y, X=X)

felipeangelimvieira · 2024-07-15T13:11:17Z

One nice side effect is that the TrendModel can also be a BaseEffect object. Will address this in a future PR.

felipeangelimvieira · 2024-07-15T14:22:29Z

I'll merge this PR and close #73, but will not create a new release today (just a pre-release). If you have any suggestions @fkiraly @felipeffm please feel free to re-open that issue.

fkiraly · 2024-07-17T16:37:28Z

Happy with the changes, as they address my main points - I am unsure about whether these are "perfect" choices, but I suppose that's something one would glean from observing usage and using this in actuality.

feat: change effects api, and use BaseObject from scikit-base

c307abe

felipeangelimvieira added the enhancement New feature or request label Jul 11, 2024

felipeangelimvieira self-assigned this Jul 11, 2024

fix: allow python3.12 in pyproject

09d6ff4

feat: add stage enum

827460d

fkiraly reviewed Jul 11, 2024

View reviewed changes

felipeangelimvieira added 9 commits July 12, 2024 13:14

feat: create Lift Likelihood effect. Create BaseAdditiveOrMultiplicat…

84e2941

…iveEffect

feat: refactor, add basemetabayesianforecaster, rename effect methods

e7dcfd2

fixes tests and new classes

6d7a59e

fix: add numpyro scope handler to multivariate

726728f

fix: hill effect

d6ee516

Add exogenous effect property docstring and update tests

9363bb2

tests: add test set_params for baseeffectsbayesianforecaster

4cf830c

tests: add tests for lift experiment effect

ff6ee99

feat: Fourier Seasonality Effect

31eb0cf

felipeangelimvieira added 6 commits July 15, 2024 09:26

fix: hierarchical model and its compatibility with new effect api

ffa27f0

fix: hierarchical model _get_predict_data

370bf3e

doc: update documentation

4f5e1eb

doc: add example notebook

8351d0f

doc: update mkdocs.yml

7c1e27e

Effect extension template

914a7e0

felipeangelimvieira merged commit 26f5102 into main Jul 15, 2024
14 checks passed

felipeangelimvieira deleted the feature/effects_api branch July 15, 2024 14:23

felipeangelimvieira changed the title ~~[ENH] Enhance effects api, and use BaseObject from scikit-base~~ [ENH] Change effects api, and use BaseObject from scikit-base Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Change effects api, and use BaseObject from scikit-base #85

[ENH] Change effects api, and use BaseObject from scikit-base #85

felipeangelimvieira commented Jul 11, 2024 •

edited

Loading

codecov bot commented Jul 11, 2024 •

edited

Loading

fkiraly left a comment

felipeangelimvieira commented Jul 12, 2024 •

edited

Loading

felipeangelimvieira commented Jul 12, 2024

felipeangelimvieira commented Jul 15, 2024 •

edited

Loading

felipeangelimvieira commented Jul 15, 2024

felipeangelimvieira commented Jul 15, 2024

fkiraly commented Jul 17, 2024

[ENH] Change effects api, and use BaseObject from scikit-base #85

[ENH] Change effects api, and use BaseObject from scikit-base #85

Conversation

felipeangelimvieira commented Jul 11, 2024 • edited Loading

Objectives

codecov bot commented Jul 11, 2024 • edited Loading

Codecov Report

fkiraly left a comment

Choose a reason for hiding this comment

base class design

names of the abstract methods

deprecation safety

felipeangelimvieira commented Jul 12, 2024 • edited Loading

felipeangelimvieira commented Jul 12, 2024

felipeangelimvieira commented Jul 15, 2024 • edited Loading

felipeangelimvieira commented Jul 15, 2024

felipeangelimvieira commented Jul 15, 2024

fkiraly commented Jul 17, 2024

felipeangelimvieira commented Jul 11, 2024 •

edited

Loading

codecov bot commented Jul 11, 2024 •

edited

Loading

felipeangelimvieira commented Jul 12, 2024 •

edited

Loading

felipeangelimvieira commented Jul 15, 2024 •

edited

Loading