A template repository for data analysis projects and simulation studies with Liesel and RLiesel. It contains the following structure for you to build on:
- 📁
src
: An implementation of the bivariate normal distribution based on TensorFlow Probability. The distribution is parameterized for RLiesel, which can be used to configure semi-parametric regression predictors for the marginal means, standard deviations and the correlation parameter. Replace the files in this directory with your own Python code. - 📁
tests
: Unit tests for the bivariate normal distribution. Add your test code here, and it will be run automatically by pytest. - 📁
examples
: Some examples using the bivariate normal distribution in combination with Quarto and GNU Parallel. Replace the files in this directory with your own data analysis scripts or simulation studies. You can also create other directories outside ofsrc
for this purpose.- 📁
01-quarto
: A semi-parametric distributional regression model with a bivariate normal response variable. This example illustrates how to integrate Python and R code using Liesel and RLiesel with Quarto and Reticulate. - 📁
02-gnu-parallel
: A small simulation study, which is implemented using GNU Parallel.
- 📁
- 📄
environment.yml
: The specification of the Conda environment for our software to run in. Change the name of the project, and its Python and R dependencies here. - 📄
pyproject.toml
: The configuration of our Python package and some Python development tools. Change the name of the Python package, its description and your name as an author here. - 🧰 Some more configuration files that should be mostly self-explanatory.
We recommend using Conda (or even better, Micromamba) for Python and R dependency management. To run our template code, follow these steps on Linux, Mac OS X or the Windows Subsystem for Linux (WSL). Note that JAX and hence Liesel do not work natively on Windows:
- Assuming you have Conda installed, create a Conda environment with the dependencies and the local Python package installed:
# on linux, wsl and intel macs:
conda env create -f environment.yml -p ./env
# on apple silicon macs (due to some bugs):
conda env create -f environment-osx-arm64.yml -p ./env
conda install -c conda-forge/osx-64 -p ./env pandoc=3.1.1
conda install -c conda-forge -p ./env quarto=1.3.450
- Activate the Conda environment and install RLiesel:
conda activate ./env
Rscript -e "remotes::install_github('liesel-devs/rliesel')"
- If you were able to follow the previous steps, you should be set to run our first example:
quarto render examples/01-quarto/example.qmd
- If Quarto or Reticulate do not use the correct Conda environment automatically, try setting the
RETICULATE_PYTHON_ENV
variable:
RETICULATE_PYTHON_ENV="$PWD/env" quarto render examples/01-quarto/example.qmd
To develop your own project based on this repository, start as follows:
- Replace the strings
liesel-template
andliesel_template
with the name of your project inenvironment.yml
,liesel-template.Rproj
,pyproject.toml
,src/liesel_template
andtests
. - Remove the Conda environment
env
and repeat the steps from the previous section.
These commands might come in handy as you continue to develop your project:
pdoc ./src
: Serves the docs with pdoc.pre-commit run -a
: Runs the pre-commit hooks.pytest
: Runs pytest.
This repository uses Conda for Python and R dependency management. Conda installs the environment for our software to run in (think: a fancy virtual environment for Python and R). If you need a different version of Python, R, Quarto or GNU Parallel, or any additional Python or R packages, edit environment.yml
.
Many simulation studies in statistics are embarrassingly parallel. It is straightforward to accelerate them by distributing them to a number of cores or computers. In our opinion, GNU Parallel is a great tool for parallelizing simulation studies using Liesel and RLiesel for the following reasons:
- It is a shell tool, so it can run both Python and R code, as well as Quarto documents integrating both programming languages.
- By opening and closing Python for each job, it works around potential memory leaks in JAX.
TODO: Add a few words about parameterized Quarto documents here.