-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time-varying coefficient #598
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some initial thoughts. will keep an eye out on changes
resolved the conflict to see the status of the tests. @ulfaslak pre-commit stuff 🥲 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #598 +/- ##
===========================================
- Coverage 91.79% 34.77% -57.03%
===========================================
Files 22 22
Lines 2267 2220 -47
===========================================
- Hits 2081 772 -1309
- Misses 186 1448 +1262 ☔ View full report in Codecov by Sentry. |
@@ -34,6 +36,8 @@ def __init__( | |||
date_column: str, | |||
channel_columns: List[str], | |||
adstock_max_lag: int, | |||
time_varying_media_effect: bool = False, | |||
time_varying_intercept: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great if we could control the HSGP. Could we add an hsgp_config
similar to how we defined the priors? So the more "expert" user could play with how much flexibility they want to give to the HSGP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to just support that in the model config? The chief thing you'd want to control is the covariance function, m and L. But for MMM this can even be simplified to just lengthscale and m, I believe since optimal L can be calculated form the length of the data and the lengthscale.
name="intercept", **self.model_config["intercept"]["kwargs"] | ||
) | ||
|
||
if self.time_varying_media_effect: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking twice, would it be better to check by the end? You load the data and apply the multiplier only if True
all at once, saving a few lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Can I refactor this whole function actually? Fingertips itching.
f = phi @ (hsgp_coefs * sqrt_psd).T | ||
if positive: | ||
f = softplus(f) | ||
return pm.Deterministic(name, f, dims=dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think would be great to create a plot for the varying parameters, showing the recovered latent pattern which is affecting the channels and/or the intercept, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Will add!
|
||
channel_contributions_var = channel_adstock_saturated * beta_channel | ||
if self.time_varying_media_effect: | ||
channel_contributions_var *= tv_multiplier_media[:, None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a strong opinion, but I think that if we are using the logic of a base contribution, which is then modified, it would be great to have this join saved in another Deterministic
variable. So, users can understand their BASE contribution and the latent multiplier effect independently. We would need to have a small extra piece of code, but what do you think?
e.g:
pm.Deterministic(
var = channel_contributions_var * tv_multiplier_media[:, None]
name='varying_contribution' #or something like it
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a nice idea. I think I might actually change the time_varying_prior function so it always swings in the positive range. Then when using it, it always works as a multiplier on base contributions. Will also make joining this logic with logic for hierarchical parameters easy.
I'll make this change, it's nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree :)
dims=("date", "channel"), | ||
name="channel_contributions", | ||
var=channel_contributions_var, | ||
dims=("date", "channel"), | ||
) | ||
|
||
mu_var = intercept + channel_contributions.sum(axis=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the HSGP is in dimension "date" from the beginning, I feel that we are first transforming said HSGP to a form that is compatible with ("date", "channel")
to return at the end to "date"
only. Isn't it better to transform the channels to "date"
and finally multiply by the HSGP without needing of [:,None]
?
It's just an opinion, I think it would be more appropriate the other way and we can avoid a somewhat unnecessary transformation, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't fully understand. You can just commit the change if you believe it is best.
In general this time_varying_prior function supports two dimensions but I'm only using the 1D case here, maybe that's confusing.
I should actually add support for individually varying parameters in this PR too... not complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should actually add support for individually varying parameters in this PR too... not complicated
I suggest we keep the scope of this PR small and then iterate
|
||
with pm.modelcontext(model) as model: | ||
if cov_func is None: | ||
eta = pm.Exponential(f"eta_{name}", lam=eta_lam) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using this current implementation, we could move these distributions into the model_config
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this would be really clever, it just becomes another supported distribution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's better than an hsgp_config I think. I'm trying to work out what's the right level of rigidity to build into this thing. There are some things here you wouldn't want to change as a user I think. And maybe this is just me, but as we put distributions into the config, we pay the price of more obscure code. Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to allow these priors to be defined in the config as GPs are quite sensitive to priors.
Hey hey, amazing work! I made a few comments, but all are initial thoughts. I'll deep dive during the week 💪🏻 |
This is super nice thanks @cetagostini! |
@@ -329,6 +331,10 @@ def apply_sklearn_transformer_across_dim( | |||
|
|||
return data | |||
|
|||
|
|||
def softplus(x: pt.TensorVariable) -> pt.TensorVariable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this exists in pytensor.tensor.math
.
@@ -336,19 +342,26 @@ def plot_posterior_predictive( | |||
) | |||
|
|||
ax.fill_between( | |||
x=self.X[self.date_column], | |||
x=posterior_predictive_data.date, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this change? The date column can have a different name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enables OOS posterior predictive. self.X
does not get updated upon self._data_setter(X_new)
. So if the posterior predictive is for new OOS data, then self.X[self.date_column]
will still be the in-sample dates.
) | ||
|
||
assert len(target_to_plot) == len(posterior_predictive_data.date), ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can wee keep date_col
as generic date name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean set date_col = posterior_predictive_data.date
somewhere earlier in this function?
@@ -28,6 +29,7 @@ | |||
|
|||
__all__ = ["DelayedSaturatedMMM"] | |||
|
|||
DAYS_IN_YEAR = 365.25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this into a constants.py
file :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect. Will do.
X_mid=self._time_index_mid, | ||
positive=True, | ||
m=200, | ||
L=[self._time_index_mid + DAYS_IN_YEAR / self._time_resolution], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be cleaner to use the c
parameeer?
c: float
The proportion extension factor. Used to construct L from X. Defined as S = max|X| such that X is in [-S, S].
L is calculated as c * S. One of c or L must be provided. Further information can be found in Ruitort-Mayol et al.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this causes some trouble with predicting out of sample. @bwengals did you find the cause for this?
else: | ||
hsgp_size = m | ||
gp = pm.gp.HSGP(m=[m], L=[L], cov_func=cov_func) | ||
phi, sqrt_psd = gp.prior_linearized(Xs=X[:, None] - X_mid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future review: We need to be careful we center the data with respect to the training set (even when we are doing out of sample prediction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? What is the concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some small comments/ questions fot thie first round :)
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-04-08T07:49:42Z Line #8. sampler_config={"nuts_sampler": "numpyro", "target_accept": 0.98}, I such large |
@@ -38,6 +40,8 @@ def __init__( | |||
date_column: str, | |||
channel_columns: List[str], | |||
adstock_max_lag: int, | |||
time_varying_media_effect: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, small comment here... This is a matter of taste but I think having two parameters for the same component is a bit strange. What do you think if we evaluate with strings or tuples?
time_varying = ('intercept', 'media') #or
time_varying = 'intercept-media'
I think it would be better inside for the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quite like this. Maybe this gets too complicated but:
time_varying='intercept'
...
time_varying='total_media'
...
time_varying=['intercept', 'total_media']
...
time_varying=['intercept', 'channel1', 'channe2']
could work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even
time_varying=['intercept', ('channel1', 'channel2')] # now channel1 and channel2 are summed and multiplied by a time varying coef
@ulfaslak This feature is very exciting! As this PR was about collecting feedback, I have a proposal: Split this PR into two:
The reason? to make The review process simpler and faster :) For each PR, we should aim for:
This way we can iterate fast, as probably we wanna push this one out soon :) 🚀 The main question from what i see is if we wanna provide all the priors and HSGP parameters configurable via |
Yep, that approach makes sense!
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Juan Orduz ***@***.***>
Sent: Monday, April 8, 2024 8:54:18 AM
To: pymc-labs/pymc-marketing ***@***.***>
Cc: Niall Oulton ***@***.***>; Review requested ***@***.***>
Subject: Re: [pymc-labs/pymc-marketing] Add time-varying coefficient (PR #598)
@ulfaslak<https://github.com/ulfaslak> This feature is very exciting! As this PR was about collecting feedback, I have a proposal: Split this PR into two:
1. Time-varying intercept
2. Time-varying media (as in this PR)
The reason? to make The review process simpler and faster :)
For each PR, we should aim for:
* Implementation with changes just on the scope of the PR.
* Docstring
* Tests
* Example Nb
this way we can iterate fast as probably we wanna push this one out soon :) 🚀
—
Reply to this email directly, view it on GitHub<#598 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5SZLZLQH2TTLVLRPAQ64B3Y4JEKVAVCNFSM6AAAAABE5QDY2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGA4TENZVHE>.
You are receiving this because your review was requested.Message ID: ***@***.***>
|
Following @juanitorduz mention, and perhaps to align with #602 , if it's worthy. We could create a What about if we treat this implementation as components as well? Similar to the PR attach (#602 ), all time varying media or intercept happens in a class which then is added to the main model. The split between two PRs (media/intercept) and the component base implementation, should make everything very easy to debug, test and play. What everyone thinks? @wd60622 @ulfaslak |
@juanitorduz My personal opinion on it, is we should allow it. |
The more modular, the better! Let's work on smaller PR so that we can iterate faster :) |
I like this. I will try to bang the first PR together then. Thanks for a lot of high-quality feedback 👌 |
This PR adds time-varying coefficient options to the
DelayedSaturatedMMM
class.Description
Specifically the following model specification is now possible:
time_varying_intercept
creates a time-varying prior on the intercept, whiletime_varying_media_effect
creates a time-varying prior on the total media contribution (i.e. not individual columns, this will be added in a later PR).🚨 For now let's get some reviews on this and agree what is a good way to add this functionality. Then we can add tests and docs later.
To make this easier to review, maybe start looking inside the
mmm_tvp_example.ipynb
, then check outtvp.py
, and then go over the modifications in related filed I had to make to get the API changes to work.Related Issue
Checklist
Modules affected
Type of change
📚 Documentation preview 📚: https://pymc-marketing--598.org.readthedocs.build/en/598/