Data processing

Repository for ETL workflow that processes BAR data.

It periodically produces public dumps of the matches data combining information from teiserver and replays database. Check out Gallery section to see how community uses this data.

Data access

Data dumps are available as Parquet under:

and Compressed CSV file under:

More documentation is available at https://beyond-all-reason.github.io/data-processing/.

Usage examples

It's easy to load data into Jupyter Notebook or Google Colab, for example: plot the number of matches over time using Polars.

Given that datasets are available under URL, you can even use one of the Web UIs built on DuckDB-Wasm to run query entirely in the browser, for example: compute number of games per type per month

Gallery

Below we want to link some cool examples of how people in the community are using the data dumps. If you've created something please share with us on Discord or here in issues!

@Atlasfailed shared reports from his personal_skill_analysis project:
@Dazazzell created a live dashboard https://dazazzell.github.io/barmaps/ (source) that shows map popularity stats for the last 60 days.

Development

This project is using dbt for managing the SQL pipeline that transforms data and DuckDB as the query engine.

Initial

Setup:

python3 -m venv .venv
source .venv/bin/activate  # but I also recommend https://direnv.net/ that will load .envrc automatically
pip install -r requirements.txt

It's also recommented to install pre commit hooks that will check style of SQL code before making a commit

pre-commit install

Usage

data_source/dev contains a small sample of the full data sources used to genrate full dumps in prod, basic development and testing should be possible purely on this sample.

To build the data marts from this sample data:

dbt run

To run tests on the generated data (e.g. validate that fields are not null, or custom queries return expected results):

dbt test

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
assets		assets
data_export		data_export
data_source		data_source
models		models
scripts		scripts
tests		tests
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.sqlfluff		.sqlfluff
.sqlfluffignore		.sqlfluffignore
LICENSE		LICENSE
README.md		README.md
dbt_project.yml		dbt_project.yml
profiles.yml		profiles.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data processing

Data access

Usage examples

Gallery

Development

Initial

Usage

About

Releases

Packages

Languages

License

beyond-all-reason/data-processing

Folders and files

Latest commit

History

Repository files navigation

Data processing

Data access

Usage examples

Gallery

Development

Initial

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages