-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
174 lines (118 loc) · 7.22 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# mostlytidyMMM
<span style="color: red;">
**This is a pre-alpha release package currently used as the Author's playground! **
**Assume none of it works correctly!!**
</span>
<!-- badges: start -->
[![Codecov test coverage](https://codecov.io/gh/lorenze3/mostlytidyMMM/branch/main/graph/badge.svg)](https://app.codecov.io/gh/lorenze3/mostlytidyMMM?branch=main)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->
## Intent
mostlytidyMMM is a toolkit for building a marketing mix model.
Similar to Meta's [Robyn](https://facebookexperimental.github.io/Robyn/) and Google's [lightweightMMM](https://github.com/google/lightweight_mmm) in that it offers basic MMM capability without an analyst needing to choose write their own functions. In terms of philosophy of MMM, this package is somewhere between Robyn and lightweightMMM -- it uses pre-modelling transformations for time delayed effects (i.e. adstocking) and saturation transformations like Robyn, but the final coefficients are estimated via MCMC in Stan to take advantage of constrained, prior informed regression models.
[tidymodels](https://github.com/tidymodels/tidymodels) provides the hyperparameter tuning and McElreath's [rethinking](https://github.com/rmcelreath/rethinking) builds a link to Stan.. To take those two foundations and build an MMM package, mostlytidyMMM has recipe steps for adstock and saturation, functions to build workflowsets out of multiple formulas, a translator from a string formula to rethinking::ulam() input, and a custom decomposition formula.
Unlike either of those packages, mostlytidyMMM allows for complete control on the part of the analyst to choose the level of data granularity and model form (*welp, currently only allows normal regression*, but it is a hierarchical bayesian regression and the specification of that is entirely up to the analyst).
In general, mostlytidyMMM allows an analyst with a table ready for MMM to specify a reasonable set of models and priors in an .xlsx configuration file and quickly hone in on a 'final' model specification.
## On the Roadmap
I think the following features are in rough priority order:
* visualizing response functions
* poisson and log-normal regression
* a budget optimizer
## Thank thinkr for FUSEN!
This package is a package, and not a bunch of functions, thanks to thinkr's [fusen](https://thinkr-open.github.io/fusen/). I highly recommend it.
## Installation
After installing [cmdstanr](https://mc-stan.org/cmdstanr/) and [rethinking](https://github.com/rmcelreath/rethinking), mostlytidyMMM can be installed directly from [GitHub](https://github.com/) via:
``` r
# install.packages("devtools")
devtools::install_github("lorenze3/mostlytidyMMM")
```
## Documentation
Full documentation website on: https://lorenze3.github.io/mostlytidyMMM
## Straightfoward MMM walkthrough
When no tuning is required, a few function calls translate the .xlsx file into a fitted model.
First we read in the configuration file:
```{r example}
suppressMessages(suppressWarnings(library(mostlytidyMMM)))
suppressMessages(suppressWarnings( library(tidyverse)))
suppressMessages(suppressWarnings(library(tidymodels)))
suppressMessages(suppressWarnings(library(rethinking)))
control_file<-system.file('no_tuning_example.xlsx',package='mostlytidyMMM')
#get each relevant table of the control file:
var_controls<-readxl::read_xlsx(control_file,'variables')
transform_controls<-readxl::read_xlsx(control_file,'role controls')
workflow_controls<-readxl::read_xlsx(control_file,"workflow") |> select(-desc)
```
The next few lines read in the data and then use the configuration files to rename, group, and sort the file.
```{r example 2}
data1<-read.csv(system.file('example2.csv',package='mostlytidyMMM'))|>rename_columns_per_controls(variable_controls=var_controls)|>
rename_columns_per_controls()|> mutate(week=as.Date(week,"%m/%d/%Y"))|>
add_fourier_vars(vc=var_controls) |> add_groups_and_sort(vc=var_controls)
```
Now we create a recipe for the pre-processing and a model formula (in the style of lmer()) based on the data and the 3 config tables
```{r example 3}
(no_tuning_recipe<-create_recipe(data1,vc=var_controls,mc=transform_controls,wc=workflow_controls))
(formula_in_a_string<-create_formula(base_recipe=no_tuning_recipe,control=workflow_controls))
```
create a rethinking::ulam appropriate flist from the formula and the config tables (priors from config, e.g.) and also a set of constraint statements:
```{r example 4}
(expressions_for_ulam<-create_ulam_list(prior_controls=var_controls,model_formula=formula_in_a_string,
grand_intercept_prior='normal(45,25)') )
(bounds_for_ulam<-make_bound_statements(variable_controls=var_controls))
```
Bake data (ie, apply the transformations) and call rethinking::ulam to fit the bayesian regression (in this case with random slopes and intercepts). **NB:**In actual use, much higher iteration numbers are preferred.
```{r example 5}
model_data<-no_tuning_recipe %>% prep(data1) %>% bake(data1)
fitted_model_obj<-ulam(expressions_for_ulam,
model_data,
constraints=bounds_for_ulam,
chains=2,
iter=100,
cores=2,
#file='no_tuning_mod',#have a care to remove this if you want to resample!
declare_all_data=F,
messages=F
)
```
a predict method for ulam objects is included in the mostlytidyMMM package
```{r example 6}
model_data$pred<-predict(fitted_model_obj,model_data)[,1]
```
Some basic charts of fit, as examples:
```{r example 7}
this_rsq<-rsq(model_data|>ungroup(),truth=sales,estimate=pred)['.estimate'] %>% unlist()
this_mape<-mape(model_data|>ungroup(),truth=sales,estimate=pred)['.estimate'] %>% unlist()
ggplot(model_data ,aes(x=sales,y=pred,color=store_id))+
geom_point()+ geom_abline(slope=1,intercept=0)+ggthemes::theme_tufte()+
ggtitle("Predicted vs Actual",subtitle=paste0('Rsq is ',round(this_rsq,2)))
```
```{r example 8}
model_preds_long<-model_data %>% pivot_longer(c(pred,sales))
ggplot(model_preds_long,aes(x=week,y=value,color=name))+geom_line()+
ggtitle("Sales and Predicted Sales by Week",subtitle=paste('MAPE is',round(this_mape)))
```
a function for decomposition is included as well:
```{r example 9}
decomps<-get_decomps_irregardless(model_data %>% ungroup(),recipe_to_use=no_tuning_recipe,
model_obj=fitted_model_obj,
)
```
Roll those up to total by week and plot them:
```{r example 10}
decomps_natl<-decomps %>% select(week,all_of(!!get_predictors_vector(no_tuning_recipe))) %>% group_by(week) %>% summarise(across(where(is.numeric),sum))
decomps_natl<-decomps_natl %>% pivot_longer(cols=c(-week))
ggplot(data=decomps_natl,aes(x=week,y=value,fill=name)) + geom_area()+ggthemes::theme_tufte()+
ggtitle("Decomposition By Week")+
theme(legend.position = 'bottom')
```