Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time-varying coefficient #598

Closed
wants to merge 4 commits into from
Closed

Add time-varying coefficient #598

wants to merge 4 commits into from

Conversation

ulfaslak
Copy link
Contributor

@ulfaslak ulfaslak commented Mar 19, 2024

This PR adds time-varying coefficient options to the DelayedSaturatedMMM class.

Description

Specifically the following model specification is now possible:

mmm = DelayedSaturatedMMM(
    date_column=my_date_column_name,
    channel_columns=my_media_column_names,
    time_varying_intercept=True,            # <-- These two args
    time_varying_media_effect=True,         # <-- are what's up
)

time_varying_intercept creates a time-varying prior on the intercept, while time_varying_media_effect creates a time-varying prior on the total media contribution (i.e. not individual columns, this will be added in a later PR).

🚨 For now let's get some reviews on this and agree what is a good way to add this functionality. Then we can add tests and docs later.

To make this easier to review, maybe start looking inside the mmm_tvp_example.ipynb, then check out tvp.py, and then go over the modifications in related filed I had to make to get the API changes to work.

Related Issue

Checklist

Modules affected

  • MMM
  • CLV

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc-marketing--598.org.readthedocs.build/en/598/

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@wd60622 wd60622 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial thoughts. will keep an eye out on changes

pymc_marketing/mmm/tvp.py Outdated Show resolved Hide resolved
pymc_marketing/mmm/delayed_saturated_mmm.py Outdated Show resolved Hide resolved
pymc_marketing/mmm/delayed_saturated_mmm.py Outdated Show resolved Hide resolved
@wd60622
Copy link
Contributor

wd60622 commented Mar 19, 2024

resolved the conflict to see the status of the tests. @ulfaslak

pre-commit stuff 🥲

Copy link

codecov bot commented Mar 19, 2024

Codecov Report

Attention: Patch coverage is 15.78947% with 48 lines in your changes are missing coverage. Please review.

Project coverage is 34.77%. Comparing base (f055f98) to head (dac69da).
Report is 5 commits behind head on main.

❗ Current head dac69da differs from pull request most recent head ec316b4. Consider uploading reports for the commit ec316b4 to get more accurate results

Files Patch % Lines
pymc_marketing/mmm/delayed_saturated_mmm.py 9.09% 20 Missing ⚠️
pymc_marketing/mmm/tvp.py 20.00% 16 Missing ⚠️
pymc_marketing/mmm/base.py 0.00% 11 Missing ⚠️
pymc_marketing/mmm/utils.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #598       +/-   ##
===========================================
- Coverage   91.79%   34.77%   -57.03%     
===========================================
  Files          22       22               
  Lines        2267     2220       -47     
===========================================
- Hits         2081      772     -1309     
- Misses        186     1448     +1262     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -34,6 +36,8 @@ def __init__(
date_column: str,
channel_columns: List[str],
adstock_max_lag: int,
time_varying_media_effect: bool = False,
time_varying_intercept: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be great if we could control the HSGP. Could we add an hsgp_config similar to how we defined the priors? So the more "expert" user could play with how much flexibility they want to give to the HSGP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to just support that in the model config? The chief thing you'd want to control is the covariance function, m and L. But for MMM this can even be simplified to just lengthscale and m, I believe since optimal L can be calculated form the length of the data and the lengthscale.

name="intercept", **self.model_config["intercept"]["kwargs"]
)

if self.time_varying_media_effect:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking twice, would it be better to check by the end? You load the data and apply the multiplier only if True all at once, saving a few lines of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Can I refactor this whole function actually? Fingertips itching.

f = phi @ (hsgp_coefs * sqrt_psd).T
if positive:
f = softplus(f)
return pm.Deterministic(name, f, dims=dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think would be great to create a plot for the varying parameters, showing the recovered latent pattern which is affecting the channels and/or the intercept, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Will add!


channel_contributions_var = channel_adstock_saturated * beta_channel
if self.time_varying_media_effect:
channel_contributions_var *= tv_multiplier_media[:, None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a strong opinion, but I think that if we are using the logic of a base contribution, which is then modified, it would be great to have this join saved in another Deterministic variable. So, users can understand their BASE contribution and the latent multiplier effect independently. We would need to have a small extra piece of code, but what do you think?

e.g:

pm.Deterministic(
var = channel_contributions_var * tv_multiplier_media[:, None]
name='varying_contribution' #or something like it
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a nice idea. I think I might actually change the time_varying_prior function so it always swings in the positive range. Then when using it, it always works as a multiplier on base contributions. Will also make joining this logic with logic for hierarchical parameters easy.

I'll make this change, it's nice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree :)

dims=("date", "channel"),
name="channel_contributions",
var=channel_contributions_var,
dims=("date", "channel"),
)

mu_var = intercept + channel_contributions.sum(axis=-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the HSGP is in dimension "date" from the beginning, I feel that we are first transforming said HSGP to a form that is compatible with ("date", "channel") to return at the end to "date" only. Isn't it better to transform the channels to "date" and finally multiply by the HSGP without needing of [:,None]?

It's just an opinion, I think it would be more appropriate the other way and we can avoid a somewhat unnecessary transformation, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't fully understand. You can just commit the change if you believe it is best.

In general this time_varying_prior function supports two dimensions but I'm only using the 1D case here, maybe that's confusing.

I should actually add support for individually varying parameters in this PR too... not complicated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should actually add support for individually varying parameters in this PR too... not complicated
I suggest we keep the scope of this PR small and then iterate


with pm.modelcontext(model) as model:
if cov_func is None:
eta = pm.Exponential(f"eta_{name}", lam=eta_lam)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using this current implementation, we could move these distributions into the model_config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this would be really clever, it just becomes another supported distribution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's better than an hsgp_config I think. I'm trying to work out what's the right level of rigidity to build into this thing. There are some things here you wouldn't want to change as a user I think. And maybe this is just me, but as we put distributions into the config, we pay the price of more obscure code. Opinions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to allow these priors to be defined in the config as GPs are quite sensitive to priors.

@cetagostini
Copy link
Contributor

Hey hey, amazing work! I made a few comments, but all are initial thoughts. I'll deep dive during the week 💪🏻

@ulfaslak
Copy link
Contributor Author

Hey hey, amazing work! I made a few comments, but all are initial thoughts. I'll deep dive during the week 💪🏻

This is super nice thanks @cetagostini!

@juanitorduz juanitorduz added enhancement New feature or request MMM labels Mar 24, 2024
@@ -329,6 +331,10 @@ def apply_sklearn_transformer_across_dim(

return data


def softplus(x: pt.TensorVariable) -> pt.TensorVariable:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this exists in pytensor.tensor.math.

@@ -336,19 +342,26 @@ def plot_posterior_predictive(
)

ax.fill_between(
x=self.X[self.date_column],
x=posterior_predictive_data.date,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change? The date column can have a different name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enables OOS posterior predictive. self.X does not get updated upon self._data_setter(X_new). So if the posterior predictive is for new OOS data, then self.X[self.date_column] will still be the in-sample dates.

)

assert len(target_to_plot) == len(posterior_predictive_data.date), (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can wee keep date_col as generic date name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean set date_col = posterior_predictive_data.date somewhere earlier in this function?

@@ -28,6 +29,7 @@

__all__ = ["DelayedSaturatedMMM"]

DAYS_IN_YEAR = 365.25
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this into a constants.py file :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Will do.

X_mid=self._time_index_mid,
positive=True,
m=200,
L=[self._time_index_mid + DAYS_IN_YEAR / self._time_resolution],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be cleaner to use the c parameeer?

c: float
  The proportion extension factor. Used to construct L from X. Defined as S = max|X| such that X is in [-S, S]. 
  L is calculated as c * S. One of c or L must be provided. Further information can be found in Ruitort-Mayol et al.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this causes some trouble with predicting out of sample. @bwengals did you find the cause for this?

else:
hsgp_size = m
gp = pm.gp.HSGP(m=[m], L=[L], cov_func=cov_func)
phi, sqrt_psd = gp.prior_linearized(Xs=X[:, None] - X_mid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future review: We need to be careful we center the data with respect to the training set (even when we are doing out of sample prediction)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? What is the concern?

Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some small comments/ questions fot thie first round :)

@juanitorduz
Copy link
Collaborator

@ulfaslak could you please rebase from main as the tests are failing because #615 🙏 ?

Copy link

review-notebook-app bot commented Apr 8, 2024

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-04-08T07:49:42Z
----------------------------------------------------------------

Line #8.        sampler_config={"nuts_sampler": "numpyro", "target_accept": 0.98},

I such large target_accept required?


@@ -38,6 +40,8 @@ def __init__(
date_column: str,
channel_columns: List[str],
adstock_max_lag: int,
time_varying_media_effect: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, small comment here... This is a matter of taste but I think having two parameters for the same component is a bit strange. What do you think if we evaluate with strings or tuples?

time_varying = ('intercept', 'media') #or
time_varying = 'intercept-media'

I think it would be better inside for the API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like this. Maybe this gets too complicated but:

time_varying='intercept'
...
time_varying='total_media'
...
time_varying=['intercept', 'total_media']
...
time_varying=['intercept', 'channel1', 'channe2']

could work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even

time_varying=['intercept', ('channel1', 'channel2')] # now channel1 and channel2 are summed and multiplied by a time varying coef

@juanitorduz
Copy link
Collaborator

juanitorduz commented Apr 8, 2024

@ulfaslak This feature is very exciting! As this PR was about collecting feedback, I have a proposal: Split this PR into two:

  1. Time-varying intercept
  2. Time-varying media (as in this PR)

The reason? to make The review process simpler and faster :)

For each PR, we should aim for:

  • Implementation with changes just on the scope of the PR.
  • Docstring
  • Tests
  • Example Nb

This way we can iterate fast, as probably we wanna push this one out soon :) 🚀

The main question from what i see is if we wanna provide all the priors and HSGP parameters configurable via model_config (Personally, i think we should)

@nialloulton
Copy link
Contributor

nialloulton commented Apr 8, 2024 via email

@cetagostini-wise
Copy link
Contributor

cetagostini-wise commented Apr 8, 2024

Following @juanitorduz mention, and perhaps to align with #602 , if it's worthy.

We could create a models.components folder and a time.py file to save the implementation? This could help to add and save all code related to "time" dependencies.

What about if we treat this implementation as components as well? Similar to the PR attach (#602 ), all time varying media or intercept happens in a class which then is added to the main model.

The split between two PRs (media/intercept) and the component base implementation, should make everything very easy to debug, test and play. What everyone thinks? @wd60622 @ulfaslak

@cetagostini-wise
Copy link
Contributor

The main question from what i see is if we wanna provide all the priors and HSGP parameters configurable via model_config (Personally, i think we should)

@juanitorduz My personal opinion on it, is we should allow it.

@juanitorduz
Copy link
Collaborator

Following @juanitorduz mention, and perhaps to align with #602 , if it's worthy.

We could create a models.components folder and a time.py file to save the implementation? This could help to add and save all code related to "time" dependencies.

What about if we treat this implementation as components as well? Similar to the PR attach, all time varying media or intercept happens in a class which then is added to the main model. This and the split between two PRs (media/intercept) could make everything very easy to debug, test and play. What everyone thinks? @wd60622 @ulfaslak

The more modular, the better! Let's work on smaller PR so that we can iterate faster :)

@ulfaslak
Copy link
Contributor Author

ulfaslak commented Apr 8, 2024

@ulfaslak This feature is very exciting! As this PR was about collecting feedback, I have a proposal: Split this PR into two:

  1. Time-varying intercept
  2. Time-varying media (as in this PR)

The reason? to make The review process simpler and faster :)

For each PR, we should aim for:

  • Implementation with changes just on the scope of the PR.
  • Docstring
  • Tests
  • Example Nb

This way we can iterate fast, as probably we wanna push this one out soon :) 🚀

The main question from what i see is if we wanna provide all the priors and HSGP parameters configurable via model_config (Personally, i think we should)

I like this.

I will try to bang the first PR together then. Thanks for a lot of high-quality feedback 👌

@ulfaslak
Copy link
Contributor Author

ulfaslak commented Apr 8, 2024

@ulfaslak could you please rebase from main as the tests are failing because #615 🙏 ?

I will close this and make a new PR for intercept-tvp on a branch off of main.

@ulfaslak ulfaslak closed this Apr 8, 2024
@ulfaslak ulfaslak mentioned this pull request Apr 15, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants