Add baselined saturation #498

ferrine · 2024-01-25T17:28:21Z

Baselined Tanh Saturation.

An alternative parameterization of the reach function is given by:

$$ \begin{align} c_0 &= \frac{r}{g \cdot \arctan(r)} \\ \beta &= \frac{g \cdot x_0}{r} \\ \text{saturation}(x, \beta, c_0) &= \beta \cdot \tanh \left( \frac{x}{c_0 \cdot \beta} \right) \end{align} $$

where:

$x_0$ is the "reference point". This is a point chosen
by the user (not given a prior) where they expect most of their data to lie (median recommended).
For example, if you're spending between 50 and 150 dollars on a particular channel,
you might choose $x_0 = 100$.
$g$ is the "gain", which is the value of the CAC ($c_0$) at the reference point.
You have to set a prior on what you think the CAC is when you spend $x_0 = 100$.
Imagine you have four advertising channels, and you acquired 1000 new users.
If each channel performed equally well, and advertising drove all sales, you might expect
that you gained 250 users from each channel. Here, your "gain" would be $250 / 100 = 2.5$.
$r$, the overspend fraction is telling you where the reference point is.
- $0$ - we can increase our budget by a lot to reach the saturated region,
  the deminishing returns are not visible yet.
- $1$ - the reference point is already in the saturation region
  and additional dollar spend will not lead to any new users.
- $0.8$, you can still increase acquired users by $50%$ as much
  you get in the reference point by increasing the budget.
  $x_0$ effect is 20% way from saturation

Key benefits

Works in equally well in saturated and undersaturated scenarios
Observed to have less sampling issues
Priors for $g$ and $r$ can be set using domain expertise easier than for the original parameterization

Origins

The original reach or saturation function used in an MMM is formulated as

$$ \text{saturation}(x, \beta, c_0) = \beta \cdot \tanh \left( \frac{x}{c_0 \cdot \beta} \right) $$

where:

$\beta$ is the saturation, or the limit of the total number
of new users obtained when an infinite number of dollars are spent on that channel.
$c_0$ is the cost per acquisition (CAC0), so the initial cost per new user.
$\frac{1}{c_0}$ is the inverse of the CAC0, so it's the number of new
users we might expect after spending our first dollar.

Description

The PR adds novel parameterization that is easier to use for industry applications where domain knowledge is an essence

Code example

  import pymc as pm
  import numpy as np

  x_in = np.exp(3+np.random.randn(100))
  true_cac = 1
  true_saturation = 100
  y_out = abs(np.random.normal(tanh_saturation(x_in, true_saturation, true_cac).eval(), 0.1))

  with pm.Model() as model_reparam:
      r = pm.Uniform("r")
      gain = pm.Exponential("gain", 1)
      input = pm.ConstantData("spent", x_in)
      response = pm.ConstantData("response", y_out)
      sigma = pm.HalfNormal("n")
      output = tanh_saturation_baselined(input, np.median(x_in), gain, r)
      pm.Normal("output", output, sigma, observed=response)
      trace = pm.sample()

Related Issue

Closes #
Related to #

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Modules affected

MMM
CLV

Type of change

📚 Documentation preview 📚: https://pymc-marketing--498.org.readthedocs.build/en/498/

for more information, see https://pre-commit.ci

codecov · 2024-01-25T17:48:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (eee06ac) 90.87% compared to head (e660740) 91.24%.
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #498      +/-   ##
==========================================
+ Coverage   90.87%   91.24%   +0.37%     
==========================================
  Files          21       21              
  Lines        1983     2044      +61     
==========================================
+ Hits         1802     1865      +63     
+ Misses        181      179       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bwengals · 2024-01-25T18:38:46Z

thanks for the writeup!!! Should be sure to mention that the reference point x0 has to be set within the range of the actual spends. As in, you buy ads three times and spend 5, 6 and 7 dollars, x0 has to be set within [5, 7], so not 4 not 8. Otherwise the posterior of r and gain becomes a skinny diagonal line. I got stuck on that one, and it could be very relevant if there is very little spend observations for a particular channel.

juanitorduz

@ferrine this is amazing! Thanks! I think once you add @bwengals 's suggestion we are ready to merge!

More of a general question: Where would the adstock transformation come in? I guess before applying this one right?

Also, maybe for a next PR we can create another MMM model subclass to make sure users can use the API :)

bwengals · 2024-01-25T21:01:37Z

Why not just do:

def tanh_saturation_baseline(x, x0, gain, r):
    saturation = (gain * x0) / r
    cac0 = r / (gain * pt.arctanh(r))
    return tanh_saturation(x, cac0, saturation)

Only 4 lines! What functionality is the dataclass for the params and baseline and debaseline adding?

Also: I don't understand the name "baseline", or how this is "baselined". Maybe something else would be a clearer name (sorry to critique without suggestions, I can't think of anything though!)

for more information, see https://pre-commit.ci

ferrine · 2024-01-26T08:12:34Z

Why not just do:
def tanh_saturation_baseline(x, x0, gain, r):
    saturation = (gain * x0) / r
    cac0 = r / (gain * pt.arctanh(r))
    return tanh_saturation(x, cac0, saturation)
Only 4 lines! What functionality is the dataclass for the params and baseline and debaseline adding?

Also: I don't understand the name "baseline", or how this is "baselined". Maybe something else would be a clearer name (sorry to critique without suggestions, I can't think of anything though!)

The dataclasses are the helpers in case you need to visualize not just gain or r, but cac, saturation, calculate it for a different x0, etc. they can just sit there and serve for those who needs them.

ferrine · 2024-01-26T08:14:01Z

More of a general question: Where would the adstock transformation come in? I guess before applying this one right?

The adstock will be in just the same order

Also, maybe for a next PR we can create another MMM model subclass to make sure users can use the API :)

Why not a boolean parameter?

…nd cac0

juanitorduz · 2024-01-26T09:45:19Z

Also, maybe for a next PR we can create another MMM model subclass to make sure users can use the API :)

Why not a boolean parameter?

I could imagine in the future, we would have many different models with different plotting functions, optimizations, depending on the functional forms ... just thinking ahead :)

juanitorduz

LGTM! Thanks!

As a side comment, personally (so not necessarily true ;) ) I like the NamedTuple structure. It has crossed my mind to even use pydantic for parameter and data validation.

ferrine · 2024-01-26T11:18:38Z

LGTM! Thanks!

As a side comment, personally (so not necessarily true ;) ) I like the NamedTuple structure. It has crossed my mind to even use pydantic for parameter and data validation.

Pydantic is great and I appreciate anyone taking effort to refactor everything to Pydantic, just not now. WDYT?

juanitorduz · 2024-01-26T13:39:55Z

I created a discussion issue #502 . I will be happy to work on this :)

* current status as method * format * Update version.txt * Implement different convolution modes (#454) * Add PR template * Update pull_request_template.md * Fix issues in index example * Update .pre-commit-config.yaml * Update .pre-commit-config.yaml * move from other PR * put legend on side * Optimisation in customer_lifetime_value when discount_rate == 0 (#468) * Optimisation in customer_lifetime_value when discount_rate == 0 cf #467 * Update utils.py * Update README.md * add support for pre-commit-ci * add isort * modify autosummary templates * Rename `clv_summary` to `rfm_summary` and extend functionality (#479) * clv_summary adapted into rfm_summary * added clv_summary with warning * moved dataset from testing folder * Update version.txt * improve ruff * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.1.11 → v0.1.14](astral-sh/ruff-pre-commit@v0.1.11...v0.1.14) - [github.com/pre-commit/pre-commit-hooks: v3.2.0 → v4.5.0](pre-commit/pre-commit-hooks@v3.2.0...v4.5.0) * resolve conflict * Add baselined saturation (#498) * add baselined saturation with test and plots * refactor docs * add the reparam * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * verify parametrization is equivalent under change of baseline * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a note for setting x0 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make it clear how r_ref is calculated * fix typo * fix docstrings * improve test by making sure transform is gives identical saturation and cac0 * add comment in the docstring * add blank line in the code-block --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Swap Before and After convolution modes as per #489 (#501) * Add support for string mode args * Swap before and after and make mode explicit * Use Union due Python 3.9 * Style * resolve conflict * add dim_name arg * add seed to tests and test methods * add slice as type hint * use slice in docstring * defaults to mean for each channel * add non-negative check * ax as last arg * change weeks -> time * parameterize quantiles * separate out and add to docs * rerun the baseline images * mock the prior * add new images from latest env * migrate to toml instead of ci/cd * test only is axes * remove the images --------- Co-authored-by: Juan Orduz <juan.orduz@wolt.com> Co-authored-by: Abdalaziz Rashid <abdalaziz.rashid@outlook.com> Co-authored-by: Ricardo Vieira <ricardo.vieira1994@gmail.com> Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com> Co-authored-by: vincent-grosbois <vincent.grosbois@gmail.com> Co-authored-by: juanitorduz <juanitorduz@gmail.com> Co-authored-by: Oriol (ProDesk) <oriol.abril.pla@gmail.com> Co-authored-by: Colt Allen <10178857+ColtAllen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Maxim Kochurov <max.kochurov@pymc-labs.com>

ferrine added 3 commits January 25, 2024 17:05

add baselined saturation with test and plots

2ba8bae

refactor docs

d582546

add the reparam

d60ae71

ferrine changed the title ~~add baselined saturation with test and plots~~ Add baselined saturation Jan 25, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

40ff569

for more information, see https://pre-commit.ci

juanitorduz reviewed Jan 25, 2024

View reviewed changes

ferrine and others added 4 commits January 26, 2024 08:05

verify parametrization is equivalent under change of baseline

05d02db

[pre-commit.ci] auto fixes from pre-commit.com hooks

5dc6b58

for more information, see https://pre-commit.ci

add a note for setting x0

5791fd8

[pre-commit.ci] auto fixes from pre-commit.com hooks

cf86aa8

for more information, see https://pre-commit.ci

ferrine added 6 commits January 26, 2024 08:15

make it clear how r_ref is calculated

99ed221

fix typo

20d55a6

fix docstrings

1cae0df

improve test by making sure transform is gives identical saturation a…

59a2b89

…nd cac0

add comment in the docstring

284d9df

add blank line in the code-block

e660740

juanitorduz approved these changes Jan 26, 2024

View reviewed changes

ferrine merged commit 28aad7a into main Jan 26, 2024
13 checks passed

ferrine deleted the saturation-baselined branch January 26, 2024 11:19

juanitorduz added the MMM label Jan 26, 2024

juanitorduz mentioned this pull request Jan 26, 2024

Pydantic for data validation? #502

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add baselined saturation #498

Add baselined saturation #498

ferrine commented Jan 25, 2024 •

edited by ricardoV94

Loading

codecov bot commented Jan 25, 2024 •

edited

Loading

bwengals commented Jan 25, 2024

juanitorduz left a comment

bwengals commented Jan 25, 2024 •

edited

Loading

ferrine commented Jan 26, 2024

ferrine commented Jan 26, 2024 •

edited

Loading

juanitorduz commented Jan 26, 2024

juanitorduz left a comment •

edited

Loading

ferrine commented Jan 26, 2024

juanitorduz commented Jan 26, 2024

Add baselined saturation #498

Add baselined saturation #498

Conversation

ferrine commented Jan 25, 2024 • edited by ricardoV94 Loading

Baselined Tanh Saturation.

Key benefits

Origins

Description

Code example

Related Issue

Checklist

Modules affected

Type of change

codecov bot commented Jan 25, 2024 • edited Loading

Codecov Report

bwengals commented Jan 25, 2024

juanitorduz left a comment

Choose a reason for hiding this comment

bwengals commented Jan 25, 2024 • edited Loading

ferrine commented Jan 26, 2024

ferrine commented Jan 26, 2024 • edited Loading

juanitorduz commented Jan 26, 2024

juanitorduz left a comment • edited Loading

Choose a reason for hiding this comment

ferrine commented Jan 26, 2024

juanitorduz commented Jan 26, 2024

ferrine commented Jan 25, 2024 •

edited by ricardoV94

Loading

codecov bot commented Jan 25, 2024 •

edited

Loading

bwengals commented Jan 25, 2024 •

edited

Loading

ferrine commented Jan 26, 2024 •

edited

Loading

juanitorduz left a comment •

edited

Loading