Demonstrating automated quality assurance (QA) reporting using Quarto.
This project is a linear regression pipeline built to demonstrate how Quarto can be used in Python to automate the production of QA reports, minimising unnecessary hassle when it comes to storing and checking QA outputs. This makes it easier for analysis to be peer reviewed, and guarantees a clearer audit trail, helping to bring new code in line with the ONS' RAP Minimum Viable Product criteria.
The report outputs the results of a handful of QA processes relating to the assumptions made by the model, namely a residual plot, a quantile-quantile (QQ) plot, a scatter plot of the data and the model itself, the p-value of the residuals, the Shapiro-Wilk test statistic, and the coefficient of correlation,
At data ingest there are some basic input validation checks performed both on the dataset (type checks of the arrays and the data they contain) and on the model hyperparameters (type checks and value checks where necessary).
-
Install the Quarto command-line interface (CLI) from the Quarto website or using
pip install quarto
from the command line. If using Visual Studio, you should additionally install the Quarto VSCode extension. -
After cloning the repository to your machine, navigate to the project directory within the terminal and install/update all package dependencies using
pip install -r requirements.txt
After cloning the repo and downloading all the dependencies, the pipeline should run unmodified by executing the main script, which should be done by navigating to the main project directory in your terminal and then entering python -m main.py
.
If desired, you may alter parameters from within the config file, where you have control over the model's initial values for the intercept and gradient, labelled
The learning rate determines the size of the steps taken by the model when updating
The config has a currently unused setting for selecting a dataset from the /data/ folder. Currently the pipeline uses a linear dataset generated in the main script by scikit-learn. This will be updated soon.
Unit tests may be run by navigating to the project's test directory in the terminal and executing the command pytest
from the terminal.