TabularDelta helps to automate and simplify the often tedious and manual process of comparing relational data.
The so-called TabularDelta protocol defines a representation of the differences between two tables. "Comparators" are used to generate such a representation from two table objects. The exchangeability of the comparators allows for varying table input formats like SQL tables or Pandas DataFrames. "Formatters" allow to present the differences in different output formats depending on the desired usecase. The flexibility in the output format allows to find small deviations in largely similar tables or provide an overview of more structural changes.
This snippet will report the differences of two CSV files. You can execute it directly in test_docs_examples.py.
import pandas as pd
from tabulardelta import PandasComparator, DetailedTextFormatter
df_old = pd.read_csv("week24.csv", index_col=[0, 1])
df_new = pd.read_csv("week25.csv", index_col=[0, 1])
delta = PandasComparator().compare(df_old, df_new)
print(DetailedTextFormatter().format(delta))
To compare two tables, first select a comparator that supports the table format. Now select any formatter that best suits your use case to obtain a visualization of the result.
To find more examples and get started, please visit the documentation.
This project is managed by pixi. You can install the package in development mode using:
git clone https://github.com/quantco/tabulardelta
cd tabulardelta
pixi run pre-commit-install
pixi run postinstall
- Make sure docker is installed
- Make sure
ODBC Driver 17 for SQL Server
is installed- See Download ODBC Driver for SQL Server
- This may require setting the
ODBCSYSINI
environment variable to the path of msodbcsql17
- Run
pixi run test
Setting up the MsSql docker container may take a while, but it will be cached for future runs as long as you keep it running.