A runnable app that demonstrates how to build a data warehouse with mara. Combines the mara-pipelines and mara-schema libraries with the mara-app framework into a project.
The example ETL integrates publicly available e-commerce and marketing data into a more general modeling and structure for highlighting the capabilities of the Mara framework.
The repository is intended to serve as a template for new projects.
Python >=3.6 and PostgreSQL >=10 and some smaller packages are required to run the example (and mara in general).
Mac:
$ brew install -v python3
$ brew install -v dialog
$ brew install -v coreutils
$ brew install -v graphviz
Ubuntu 16.04:
$ sudo apt install git dialog coreutils graphviz python3 python3-dev python3-venv
Mara does not run Windows.
On Mac, install Postgresql with brew install -v postgresql
. On Ubuntu, follow these instructions.
Also, install the cstore_fdw with brew install cstore_fdw
and postgresql-hll extensions from source.
To optimize PostgreSQL for ETL workloads, update your postgresql.conf along this example.
Start a database client with sudo -u postgres psql postgres
and then create a user with CREATE ROLE root SUPERUSER LOGIN;
(you can use any other name).
Clone the repository somewhere and hit make
in the root directory of the project. This will:
- create a virtual environment in
.venv
, - install all packages from
requirements.txt.freeze
(if you want to create a newrequirements.txt.freeze
fromrequirements.txt
, then runmake update-packages
), - copy the file
app/local_setup.py.example
toapp/local_setup.py
, which you can adapt to your machine. - create the necessary databases and a number of tables that are needed for running mara.
- store the Olist e-commerce and marketing data in the
olist_ecommerce
PostgreSQL database, locally.
You can now activate the virtual environment with
$ source .venv/bin/activate
To list all available flask cli commands, run flask
without parameters.
$ flask run --with-threads --reload --eager-loading
The app is now accessible at http://localhost:5000.
For development, it is recommended to run the ETL from the web UI (see above).
On production, use flask mara_pipelines.ui.run
to run a pipeline or a set of its child nodes.
The command mara_pipelines.ui.run_interactively
provides an ncurses-based menu for selecting and running pipelines.
Documentation is work in progress. But the code base is quite small and documented.