Skip to content

Quantco/tabulardelta

Repository files navigation

TabularDelta

CI Documentation pypi-version python-version

TabularDelta helps to automate and simplify the often tedious and manual process of comparing relational data.

The so-called TabularDelta protocol defines a representation of the differences between two tables. "Comparators" are used to generate such a representation from two table objects. The exchangeability of the comparators allows for varying table input formats like SQL tables or Pandas DataFrames. "Formatters" allow to present the differences in different output formats depending on the desired usecase. The flexibility in the output format allows to find small deviations in largely similar tables or provide an overview of more structural changes.

Usage example

This snippet will report the differences of two CSV files. You can execute it directly in test_docs_examples.py.

import pandas as pd
from tabulardelta import PandasComparator, DetailedTextFormatter

df_old = pd.read_csv("week24.csv", index_col=[0, 1])
df_new = pd.read_csv("week25.csv", index_col=[0, 1])

delta = PandasComparator().compare(df_old, df_new)
print(DetailedTextFormatter().format(delta))

To compare two tables, first select a comparator that supports the table format. Now select any formatter that best suits your use case to obtain a visualization of the result.

To find more examples and get started, please visit the documentation.

Development

This project is managed by pixi. You can install the package in development mode using:

git clone https://github.com/quantco/tabulardelta
cd tabulardelta

pixi run pre-commit-install
pixi run postinstall

Testing

  • Make sure docker is installed
  • Make sure ODBC Driver 17 for SQL Server is installed
  • Run pixi run test

Setting up the MsSql docker container may take a while, but it will be cached for future runs as long as you keep it running.