Repository for ETL workflow that processes BAR data.
It periodically produces public dumps of the matches data combining information from teiserver and replays database. Check out Gallery section to see how community uses this data.
Data dumps are available as Parquet under:
- https://data-marts.beyondallreason.dev/matches.parquet
- https://data-marts.beyondallreason.dev/match_players.parquet
- https://data-marts.beyondallreason.dev/players.parquet
and Compressed CSV file under:
- https://data-marts.beyondallreason.dev/matches.csv.gz
- https://data-marts.beyondallreason.dev/match_players.csv.gz
- https://data-marts.beyondallreason.dev/players.csv.gz
More documentation is available at https://beyond-all-reason.github.io/data-processing/.
It's easy to load data into Jupyter Notebook or Google Colab, for example: plot the number of matches over time using Polars.
Given that datasets are available under URL, you can even use one of the Web UIs built on DuckDB-Wasm to run query entirely in the browser, for example: compute number of games per type per month
Below we want to link some cool examples of how people in the community are using the data dumps. If you've created something please share with us on Discord or here in issues!
- @Atlasfailed shared reports from his personal_skill_analysis project:
- @Dazazzell created a live dashboard https://dazazzell.github.io/barmaps/ (source) that shows map popularity stats for the last 60 days.
This project is using dbt for managing the SQL pipeline that transforms data and DuckDB as the query engine.
Setup:
python3 -m venv .venv
source .venv/bin/activate # but I also recommend https://direnv.net/ that will load .envrc automatically
pip install -r requirements.txt
It's also recommented to install pre commit hooks that will check style of SQL code before making a commit
pre-commit install
data_source/dev
contains a small sample of the full data sources used to
genrate full dumps in prod, basic development and testing should be possible
purely on this sample.
To build the data marts from this sample data:
dbt run
To run tests on the generated data (e.g. validate that fields are not null, or custom queries return expected results):
dbt test