Merge branch 'main' into allow-covariates-plot-expected-purchases

pymc-labs · Feb 5, 2025 · 60372e0 · 60372e0
2 parents 74de286 + 542a85b
commit 60372e0
Show file tree

Hide file tree

Showing 41 changed files with 4,355 additions and 2,724 deletions.
diff --git a/.github/release.yml b/.github/release.yml
@@ -11,6 +11,9 @@ changelog:
     - title: Major Changes 🛠
       labels:
         - major
+    - title: Deprecations 🚨
+      labels:
+        - deprecation
     - title: New Features 🎉
       labels:
         - enhancement

diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -16,6 +16,10 @@ on:
      - "tests/**.py"
      - "pymc_marketing/**"
 
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
 env:
   # The lower bound from the pyproject.toml file
   OLDEST_PYMC_VERSION: "$(grep -E 'pymc *>' pyproject.toml | sed -n 's/.*>=\\([0-9]*\\.[0-9]*\\.[0-9]*\\).*/\\1/p')"

diff --git a/.github/workflows/test_notebook.yml b/.github/workflows/test_notebook.yml
@@ -18,6 +18,10 @@ on:
      - "docs/source/notebooks/**.ipynb"
      - "!docs/source/notebooks/*/dev/**.ipynb"
 
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
 jobs:
   example_notebooks:
     runs-on: ubuntu-latest

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -11,7 +11,7 @@ repos:
           - --exclude=docs/
           - --exclude=scripts/
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.9.2
+    rev: v0.9.4
     hooks:
       - id: ruff
         types_or: [python, pyi, jupyter]

diff --git a/README.md b/README.md
@@ -68,6 +68,7 @@ Leverage our Bayesian MMM API to tailor your marketing strategies effectively. L
 | Time-varying Intercept                     | Capture time-varying baseline contributions in your model (using modern and efficient Gaussian processes approximation methods). See [guide notebook](https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_time_varying_media_example.html).                                                                                                                                       |
 | Time-varying Media Contribution            | Capture time-varying media efficiency in your model (using modern and efficient Gaussian processes approximation methods). See the [guide notebook](https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_tvp_example.html).                                                                                                                                                        |
 | Visualization and Model Diagnostics        | Get a comprehensive view of your model's performance and insights.                                                                                                                                                                                                                                                                                                                      |
+| Causal Identification                      | Input a business driven directed acyclic graph to identify the meaningful variables to include into the model to be able to draw causal conclusions. For a concrete example see the [guide notebook](https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_causal_identification.html).                                                                                             |
 | Choose among many inference algorithms     | We provide the option to choose between various NUTS samplers (e.g. BlackJax, NumPyro and Nutpie). See the [example notebook](https://www.pymc-marketing.io/en/stable/notebooks/general/other_nuts_samplers.html) for more details.                                                                                                                                                     |
 | GPU Support                                | PyMC's multiple backends allow for GPU acceleration.                                                                                                                                                                                                                                                                                                                                    |
 | Out-of-sample Predictions                  | Forecast future marketing performance with credible intervals. Use this for simulations and scenario planning.                                                                                                                                                                                                                                                                          |
@@ -102,7 +103,7 @@ mmm = MMM(
 )
 ```
 
-Initiate fitting and get a visualization of some of the outputs with:
+Initiate fitting and get insightful plots and summaries. For example, we can plot the components contributions:
 
 ```python
 X = data.drop("y",axis=1)
@@ -113,13 +114,20 @@ mmm.plot_components_contributions();
 
 ![](docs/source/_static/mmm_plot_components_contributions.png)
 
+You can compute channels efficienty and compare them with the estimated return on ad spend (ROAS).
+
+<center>
+    <img src="docs/source/_static/roas_efficiency.png" width="70%" />
+</center>
+
 Once the model is fitted, we can further optimize our budget allocation as we are including diminishing returns and carry-over effects in our model.
 
 <center>
     <img src="docs/source/_static/mmm_plot_plot_channel_contributions_grid.png" width="80%" />
 </center>
 
-Explore a hands-on [simulated example](https://pymc-marketing.readthedocs.io/en/stable/notebooks/mmm/mmm_example.html) for more insights into MMM with PyMC-Marketing.
+- Explore a hands-on [simulated example](https://pymc-marketing.readthedocs.io/en/stable/notebooks/mmm/mmm_example.html) for more insights into MMM with PyMC-Marketing.
+- Get started with a complete end-to-end analysis: from model specification to budget allocation. See the [guide notebook](https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_case_study.html).
 
 ### Essential Reading for Marketing Mix Modeling (MMM)
 
@@ -207,13 +215,17 @@ mvits = MVITS(
 # Fit model
 mvits.fit(X, y)
 
-# Plot counterfactuals
-mvits.plot_counterfactual()
-
 # Plot causal impact on market share
 mvits.plot_causal_impact_market_share()
+
+# Plot counterfactuals
+mvits.plot_counterfactual()
 ```
 
+<center>
+    <img src="docs/source/_static/conterfactual.png" width="100%" />
+</center>
+
 See our example notebooks for [saturated markets](https://www.pymc-marketing.io/en/stable/notebooks/customer_choice/mv_its_saturated.html) and [unsaturated markets](https://www.pymc-marketing.io/en/stable/notebooks/customer_choice/mv_its_unsaturated.html) to learn more about customer choice modeling with PyMC-Marketing.
 
 ## Why PyMC-Marketing vs other solutions?

diff --git a/docs/source/_static/conterfactual.png b/docs/source/_static/conterfactual.png
diff --git a/docs/source/_static/roas_efficiency.png b/docs/source/_static/roas_efficiency.png
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -10,7 +10,7 @@
 # General information about the project.
 project = "pymc-marketing"
 author = "PyMC Labs"
-copyright = f"2022, {author}"
+copyright = f"2022-%Y, {author}"
 html_title = "Open Source Marketing Analytics Solution"
 
 # The master toctree document.

diff --git a/docs/source/guide/mmm/comparison.md b/docs/source/guide/mmm/comparison.md
@@ -16,4 +16,4 @@ Given the popularity of the Media Mix Modelling (MMM) approach, there are many p
 | Custom priors | ✅ | ✅ | ❌ | ❌ | ✅ |
 | Lift-test calibration | ✅  | ❌ | ✅ | ❌ | ✅ |
 | Out of sample predictions | ✅ | ✅ | ❌ | ✅ | ✅ |
-| Unit-tested | ✅ | ✅ | ❌ | ✅ | ? |
+| Unit-tested | ✅ | ✅ | ❌ | ✅ | ✅ |
diff --git a/docs/source/notebooks/index.md b/docs/source/notebooks/index.md
@@ -7,8 +7,6 @@ Here you will find a collection of examples and how-to guides for using PyMC-Mar
 :caption: MMMs
 :maxdepth: 1
 
-mmm/mmm_example
-mmm/mmm_budget_allocation_example
 mmm/mmm_allocation_assessment
 mmm/mmm_budget_allocation_example
 mmm/mmm_case_study

diff --git a/docs/source/notebooks/mmm/mmm_budget_allocation_example.ipynb b/docs/source/notebooks/mmm/mmm_budget_allocation_example.ipynb
@@ -1236,7 +1236,7 @@
     "initial_budget_scenario[\"t\"] = 0\n",
     "\n",
     "response_initial_budget = mmm.sample_posterior_predictive(\n",
-    "    X_pred=initial_budget_scenario, extend_idata=False\n",
+    "    initial_budget_scenario, extend_idata=False\n",
     ")\n",
     "\n",
     "response_initial_budget"

diff --git a/docs/source/notebooks/mmm/mmm_case_study.ipynb b/docs/source/notebooks/mmm/mmm_case_study.ipynb
diff --git a/docs/source/notebooks/mmm/mmm_counterfactuals.ipynb b/docs/source/notebooks/mmm/mmm_counterfactuals.ipynb
@@ -701,7 +701,7 @@
    ],
    "source": [
     "y_forecast = mmm.sample_posterior_predictive(\n",
-    "    X_pred=X_forecast, extend_idata=False, include_last_observations=True\n",
+    "    X_forecast, extend_idata=False, include_last_observations=True\n",
     ")"
    ]
   },
@@ -958,7 +958,7 @@
    ],
    "source": [
     "y_intervention = mmm.sample_posterior_predictive(\n",
-    "    X_pred=X_intervention, extend_idata=False, include_last_observations=True\n",
+    "    X_intervention, extend_idata=False, include_last_observations=True\n",
     ")"
    ]
   },
@@ -1289,7 +1289,7 @@
    ],
    "source": [
     "y_counterfactual = mmm.sample_posterior_predictive(\n",
-    "    X_pred=X_counterfactual, extend_idata=False\n",
+    "    X_counterfactual, extend_idata=False\n",
     ");"
    ]
   },

diff --git a/docs/source/notebooks/mmm/mmm_example.ipynb b/docs/source/notebooks/mmm/mmm_example.ipynb
diff --git a/docs/source/notebooks/mmm/mmm_time_slice_cross_validation.ipynb b/docs/source/notebooks/mmm/mmm_time_slice_cross_validation.ipynb
@@ -260,7 +260,7 @@
     "    mmm = fit_mmm(mmm, X_train, y_train, random_seed)\n",
     "\n",
     "    y_pred_test = mmm.sample_posterior_predictive(\n",
-    "        X_pred=X_test,\n",
+    "        X_test,\n",
     "        include_last_observations=True,\n",
     "        original_scale=True,\n",
     "        extend_idata=False,\n",

diff --git a/docs/source/notebooks/mmm/mmm_tvp_example.ipynb b/docs/source/notebooks/mmm/mmm_tvp_example.ipynb
@@ -613,7 +613,7 @@
                 "    # Sample posterior predictive in whole data range (train and test)\n",
                 "    if \"posterior_predictive\" not in mmm.idata:\n",
                 "        mmm.sample_posterior_predictive(\n",
-                "            X_pred=DATA, extend_idata=True, var_names=[\"y\", \"intercept\"]\n",
+                "            DATA, extend_idata=True, var_names=[\"y\", \"intercept\"]\n",
                 "        )\n",
                 "    mmm.y = target_series.values\n",
                 "\n",

diff --git a/pymc_marketing/clv/models/basic.py b/pymc_marketing/clv/models/basic.py
@@ -273,10 +273,6 @@ def output_var(self):
         """Output variable of the model."""
         pass
 
-    def _generate_and_preprocess_model_data(self, *args, **kwargs):
-        """Generate and preprocess model data."""
-        pass
-
     def _data_setter(self):
         """Set the data for the model."""
         pass
diff --git a/pymc_marketing/constants.py b/pymc_marketing/constants.py
@@ -15,3 +15,4 @@
 
 DAYS_IN_YEAR: float = 365.25
 DAYS_IN_MONTH: float = DAYS_IN_YEAR / 12
+DAYS_IN_WEEK: int = 7
diff --git a/pymc_marketing/mlflow.py b/pymc_marketing/mlflow.py
@@ -180,6 +180,131 @@
 warnings.warn(warning_msg, FutureWarning, stacklevel=1)
 
 
+def _exclude_tuning(func):
+    def callback(trace, draw):
+        if draw.tuning:
+            return
+
+        return func(trace, draw)
+
+    return callback
+
+
+def _take_every(n: int):
+    def decorator(func):
+        def callback(trace, draw):
+            if draw.draw_idx % n != 0:
+                return
+
+            return func(trace, draw)
+
+        return callback
+
+    return decorator
+
+
+def create_log_callback(
+    stats: list[str] | None = None,
+    parameters: list[str] | None = None,
+    exclude_tuning: bool = True,
+    take_every: int = 100,
+):
+    """Create callback function to log sample stats and parameter values to MLflow during sampling.
+
+    This callback only works for the "pymc" sampler.
+
+    Parameters
+    ----------
+    stats : list of str, optional
+        List of sample statistics to log from the Draw
+    parameters : list of str, optional
+        List of parameters to log from the Draw
+    exclude_tuning : bool, optional
+        Whether to exclude tuning steps from logging. Defaults to True.
+
+    Returns
+    -------
+    callback : Callable
+        The callback function to log sample stats and parameter values to MLflow during sampling
+
+    Examples
+    --------
+    Create example model:
+
+    .. code-block:: python
+
+        import pymc as pm
+
+        with pm.Model() as model:
+            mu = pm.Normal("mu")
+            sigma = pm.HalfNormal("sigma")
+            obs = pm.Normal("obs", mu=mu, sigma=sigma, observed=[1, 2, 3])
+
+    Log off divergences and logp every 100th draw:
+
+    .. code-block:: python
+
+        import mlflow
+
+        from pymc_marketing.mlflow import create_log_callback
+
+        callback = create_log_callback(
+            stats=["diverging", "model_logp"],
+            take_every=100,
+        )
+
+        mlflow.set_experiment("Live Tracking Stats")
+
+        with mlflow.start_run():
+            idata = pm.sample(model=model, callback=callback)
+
+    Log the parameters `mu` and `sigma_log__` every 100th draw:
+
+    .. code-block:: python
+
+        import mlflow
+
+        from pymc_marketing.mlflow import create_log_callback
+
+        callback = create_log_callback(
+            parameters=["mu", "sigma_log__"],
+            take_every=100,
+        )
+
+        mlflow.set_experiment("Live Tracking Parameters")
+
+        with mlflow.start_run():
+            idata = pm.sample(model=model, callback=callback)
+
+    """
+    if not stats and not parameters:
+        raise ValueError("At least one of `stats` or `parameters` must be provided.")
+
+    def callback(_, draw):
+        prefix = f"chain_{draw.chain}"
+        for stat in stats or []:
+            mlflow.log_metric(
+                key=f"{prefix}/{stat}",
+                value=draw.stats[0][stat],
+                step=draw.draw_idx,
+            )
+
+        for parameter in parameters or []:
+            mlflow.log_metric(
+                key=f"{prefix}/{parameter}",
+                value=draw.point[parameter],
+                step=draw.draw_idx,
+            )
+
+    if exclude_tuning:
+        callback = _exclude_tuning(callback)
+
+    if take_every:
+        callback = _take_every(n=take_every)(callback)
+
+    return callback
+
+
 def _log_and_remove_artifact(path: str | Path) -> None:
     """Log an artifact to MLflow and then remove the local file.
 
@@ -253,11 +378,15 @@ def log_metadata(model: Model, idata: az.InferenceData) -> None:
     """
     data_vars: list[TensorVariable] = model.data_vars
 
-    features = {
-        var.name: idata.constant_data[var.name].to_numpy()
-        for var in data_vars
-        if var.name in idata.constant_data
-    }
+    if "constant_data" in idata:
+        features = {
+            var.name: idata.constant_data[var.name].to_numpy()
+            for var in data_vars
+            if var.name in idata.constant_data
+        }
+    else:
+        features = {}
+
     targets = {
         var.name: idata.observed_data[var.name].to_numpy()
         for var in model.observed_RVs
@@ -544,7 +673,7 @@ class MMMWrapper(mlflow.pyfunc.PythonModel):
         Combine chain and draw dims into sample. Won't work if a dim named sample already exists. Defaults to True.
     include_last_observations : bool, default=False
         Boolean determining whether to include the last observations of the training data in order to carry over
-        costs with the adstock transformation. Assumes that X_pred are the next predictions following the
+        costs with the adstock transformation. Assumes that X are the next predictions following the
         training data. Defaults to False.
     original_scale : bool, default=True
         Boolean determining whether to return the predictions in the original scale of the target variable.
@@ -671,7 +800,7 @@ def log_mmm(
         already exists. Used for posterior/prior predictive sampling. Defaults to True.
     include_last_observations : bool, optional
         Whether to include the last observations of training data for adstock transformation.
-        Assumes X_pred are next predictions following training data. Used for all prediction
+        Assumes X are next predictions following training data. Used for all prediction
         methods. Defaults to False.
     original_scale : bool, optional
         Whether to return predictions in original scale of target variable. Used for all

diff --git a/pymc_marketing/mmm/__init__.py b/pymc_marketing/mmm/__init__.py
@@ -35,7 +35,7 @@
     TanhSaturationBaselined,
     saturation_from_dict,
 )
-from pymc_marketing.mmm.fourier import MonthlyFourier, YearlyFourier
+from pymc_marketing.mmm.fourier import MonthlyFourier, WeeklyFourier, YearlyFourier
 from pymc_marketing.mmm.hsgp import (
     HSGP,
     CovFunc,
@@ -85,6 +85,7 @@
     "SaturationTransformation",
     "TanhSaturation",
     "TanhSaturationBaselined",
+    "WeeklyFourier",
     "WeibullCDFAdstock",
     "WeibullPDFAdstock",
     "YearlyFourier",
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,3 +15,4 @@

		DAYS_IN_YEAR: float = 365.25
		DAYS_IN_MONTH: float = DAYS_IN_YEAR / 12
		DAYS_IN_WEEK: int = 7