For collaborating and developing Bayesian MMM models in PyMC
To help with reproducibility in this projectdo, the provided Dockerfile is intended to be used in a Visual Studio (VS) Code devcontainer.
- Python 3.11
- Conda environment with PyMC 5+
- Jupyter Notebook integration
- Access to local data sets
- DuckDB for working with Parquet
- Open VS Code and make sure to have the Dev Containers extension installed.
-
Set the environment variable
MMM_DATA_PATH=/local/path/to/data/directory
. This variable will need to point to a valid directory in order to build the dev container. See the "Incorporating Data" section below. -
Open repo folder in a dev container. The contents of the
.devcontainer
folder configure the container using Dockerfile. The Docker image will need to be built from scratch the first time, so be patient.
- The dev container is configured to edit and run Jupyter notebooks directly within VS Code. Test out your new environment by opening notebook
example/pymc_mmm.ipynb
. Select themmm
kernel to use the installed Conda environment.
Please keep this repo free of data, credentials and other secrets (one exception being data accompanying example notebook). Instead, the dev container is setup to use an environment variable MMM_DATA_PATH
to access datasets outside this repo. When the dev container gets built, the contents of your local folder will get mounted to /root/data
within the container. Try it out with example/using_data.ipynb
.
To avoid dependency drift, we version control environment.yml
. This also avoids us have to re-solve the environment each time we rebuild the dev container, which can take a lot of time.
Here's the recommended way to update dependencies:
cp environment.yml environment-dev.yml
(the latter is not version controlled)mamba env update --file environment-dev.yml --prune
(using mamba to solve the environment is much faster)- test the code
mamba env export --name mmm --file environment.yml
- open a PR