Skip to content

Commit

Permalink
Merge pull request #114 from e10v/dev
Browse files Browse the repository at this point in the history
Switch to PyArrow for internal data and remove Pandas dependency
  • Loading branch information
e10v authored Jan 5, 2025
2 parents 3aa878e + 64b43d6 commit 7450313
Show file tree
Hide file tree
Showing 30 changed files with 1,158 additions and 954 deletions.
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@

**tea-tasting** is a Python package for the statistical analysis of A/B tests featuring:

- Student's t-test, Z-test, Bootstrap, and quantile metrics out of the box.
- Student's t-test, Z-test, bootstrap, and quantile metrics out of the box.
- Extensible API: define and use statistical tests of your choice.
- [Delta method](https://alexdeng.github.io/public/files/kdd2018-dm.pdf) for ratio metrics.
- Variance reduction with [CUPED](https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf)/[CUPAC](https://doordash.engineering/2020/06/08/improving-experimental-power-through-control-using-predictions-as-covariate-cupac/) (also in combination with the delta method for ratio metrics).
- Confidence intervals for both absolute and percentage change.
- Variance reduction using [CUPED](https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf)/[CUPAC](https://doordash.engineering/2020/06/08/improving-experimental-power-through-control-using-predictions-as-covariate-cupac/) (which can also be combined with the delta method for ratio metrics).
- Confidence intervals for both absolute and percentage changes.
- Sample ratio mismatch check.
- Power analysis.
- Multiple hypothesis testing (family-wise error rate and false discovery rate).
Expand Down Expand Up @@ -56,7 +56,6 @@ Learn more in the detailed [user guide](https://tea-tasting.e10v.me/user-guide/)

## Roadmap

- Switch from Pandas DataFrames to PyArrow Tables for internal data. Make Pandas dependency optional.
- A/A tests and simulations.
- More statistical tests:
- Asymptotic and exact tests for frequency data.
Expand Down
2 changes: 0 additions & 2 deletions docs/api/config.md
Original file line number Diff line number Diff line change
@@ -1,3 +1 @@
::: tea_tasting.config
options:
members_order: source
2 changes: 0 additions & 2 deletions docs/api/multiplicity.md
Original file line number Diff line number Diff line change
@@ -1,3 +1 @@
::: tea_tasting.multiplicity
options:
members_order: source
42 changes: 21 additions & 21 deletions docs/custom-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@

## Intro

**tea-tasting** supports Student's t-test, Z-test, and [some other statistical tests](api/metrics/index.md) out of the box. However, you might want to analyze an experiment using other statistical criteria. In this case you can define a custom metric with statistical test of your choice.
**tea-tasting** supports Student's t-test, Z-test, and [some other statistical tests](api/metrics/index.md) out of the box. However, you might want to analyze an experiment using other statistical criteria. In this case, you can define a custom metric with a statistical test of your choice.

In **tea-tasting**, there are two types of metrics:

- Metrics that require only aggregated statistics for analysis.
- Metrics that require granular data for analysis.
- Metrics that require only aggregated statistics for the analysis.
- Metrics that require granular data for the analysis.

This guide explains how to define a custom metric for each type.

Expand All @@ -17,7 +17,7 @@ First, let's import all the required modules and prepare the data:
from typing import Literal, NamedTuple

import numpy as np
import pandas as pd
import pyarrow as pa
import scipy.stats
import tea_tasting as tt
import tea_tasting.aggr
Expand All @@ -26,7 +26,7 @@ import tea_tasting.metrics
import tea_tasting.utils


data = tt.make_users_data(seed=42)
data = tt.make_users_data(seed=42, return_type="pandas")
data["has_order"] = data.orders.gt(0).astype(int)
print(data)
#> user variant sessions orders revenue has_order
Expand Down Expand Up @@ -63,7 +63,7 @@ class ProportionResult(NamedTuple):
statistic: float
```

The second step is defining the metric class itself. Metric based on aggregated statistics should be a subclass of [`MetricBaseAggregated`](api/metrics/base.md#tea_tasting.metrics.base.MetricBaseAggregated). `MetricBaseAggregated` is a generic class with the result class as a type variable.
The second step is defining the metric class itself. A metric based on aggregated statistics should be a subclass of [`MetricBaseAggregated`](api/metrics/base.md#tea_tasting.metrics.base.MetricBaseAggregated). `MetricBaseAggregated` is a generic class with the result class as a type variable.

The metric should have the following methods and properties defined:

Expand Down Expand Up @@ -119,15 +119,15 @@ class Proportion(tea_tasting.metrics.MetricBaseAggregated[ProportionResult]):
)
```

Method `__init__` save metric parameters to be used in analysis. You can use utility functions [`check_scalar`](api/utils.md#tea_tasting.utils.check_scalar) and [`auto_check`](api/utils.md#tea_tasting.utils.auto_check) to check parameter values.
Method `__init__` saves metric parameters to be used in the analysis. You can use utility functions [`check_scalar`](api/utils.md#tea_tasting.utils.check_scalar) and [`auto_check`](api/utils.md#tea_tasting.utils.auto_check) to check parameter values.

Property `aggr_cols` returns an instance of [`AggrCols`](api/metrics/base.md#tea_tasting.metrics.base.AggrCols). Analysis of proportion requires the number of rows (`has_count=True`) and the average value for the column of interest (`mean_cols=(self.column,)`) for each variant.

Method `analyze_aggregates` accepts two parameters: `control` and `treatment` data as instances of class [`Aggregates`](api/aggr.md#tea_tasting.aggr.Aggregates). They contain values for statistics and columns specified in `aggr_cols`.

Method `analyze_aggregates` returns an instance of `ProportionResult`, defined earlier, with analysis result.
Method `analyze_aggregates` returns an instance of `ProportionResult`, defined earlier, with the analysis result.

Now we can analyze the proportion of users who created at least one order during the experiment. For comparison, let's also add a metric that performs Z-test on the same column.
Now we can analyze the proportion of users who created at least one order during the experiment. For comparison, let's also add a metric that performs a Z-test on the same column.

```python
experiment_prop = tt.Experiment(
Expand All @@ -142,7 +142,7 @@ print(experiment_prop.analyze(data))

## Metrics based on granular data

Now let's define a metric that performs the Mann-Whitney U test. While it's possible to use the aggregated sum of ranks in the test, this example will use granular data for analysis.
Now let's define a metric that performs the Mann-Whitney U test. While it's possible to use the aggregated sum of ranks for the test, this example uses granular data for analysis.

The result class:

Expand All @@ -152,13 +152,13 @@ class MannWhitneyUResult(NamedTuple):
statistic: float
```

Metric that analyses granular data should be a subclass of [`MetricBaseGranular`](api/metrics/base.md#tea_tasting.metrics.base.MetricBaseGranular). `MetricBaseGranular` is a generic class with the result class as a type variable.
A metric that analyzes granular data should be a subclass of [`MetricBaseGranular`](api/metrics/base.md#tea_tasting.metrics.base.MetricBaseGranular). `MetricBaseGranular` is a generic class with the result class as a type variable.

Metric should have the following methods and properties defined:

- Method `__init__` checks and saves metric parameters.
- Property `cols` returns columns to be fetched for an analysis.
- Method `analyze_dataframes` analyzes the metric using granular data.
- Method `analyze_granular` analyzes the metric using granular data.

```python
class MannWhitneyU(tea_tasting.metrics.MetricBaseGranular[MannWhitneyUResult]):
Expand All @@ -181,14 +181,14 @@ class MannWhitneyU(tea_tasting.metrics.MetricBaseGranular[MannWhitneyUResult]):
def cols(self) -> tuple[str]:
return (self.column,)

def analyze_dataframes(
def analyze_granular(
self,
control: pd.DataFrame,
treatment: pd.DataFrame,
control: pa.Table,
treatment: pa.Table,
) -> MannWhitneyUResult:
res = scipy.stats.mannwhitneyu(
treatment[self.column],
control[self.column],
treatment[self.column].combine_chunks().to_numpy(zero_copy_only=False),
control[self.column].combine_chunks().to_numpy(zero_copy_only=False),
use_continuity=self.correction,
alternative=self.alternative,
)
Expand All @@ -200,9 +200,9 @@ class MannWhitneyU(tea_tasting.metrics.MetricBaseGranular[MannWhitneyUResult]):

Property `cols` should return a sequence of strings.

Method `analyze_dataframes` accepts two parameters: control and treatment data as Pandas DataFrames. Even with [data backend](data-backends.md) different from Pandas, **tea-tasting** will retrieve the data and transform into a Pandas DataFrame.
Method `analyze_granular` accepts two parameters: control and treatment data as PyArrow Tables. Even with [data backend](data-backends.md) different from PyArrow, **tea-tasting** will retrieve the data and transform into a PyArrow Table.

Method `analyze_dataframes` returns an instance of `MannWhitneyUResult`, defined earlier, with analysis result.
Method `analyze_granular` returns an instance of `MannWhitneyUResult`, defined earlier, with analysis result.

Now we can perform the Mann-Whitney U test:

Expand Down Expand Up @@ -237,7 +237,7 @@ print(experiment.analyze(data))
#> mwu_revenue - - - [-, -] 0.0300
```

In this case, **tea-tasting** perform two queries on experimental data:
In this case, **tea-tasting** performs two queries on the experimental data:

- With aggregated statistics required for analysis of metrics of type `MetricBaseAggregated`.
- With detailed data with columns required for analysis of metrics of type `MetricBaseGranular`.
Expand All @@ -249,4 +249,4 @@ Follow these recommendations when defining custom metrics:
- Use parameter and attribute names consistent with the ones that are already defined in **tea-tasting**. For example, use `pvalue` instead of `p_value` or `correction` instead of `use_continuity`.
- End confidence interval boundary names with `"_ci_lower"` and `"_ci_upper"`.
- During initialization, save parameter values in metric attributes using the same names. For example, use `self.correction = correction` instead of `self.use_continuity = correction`.
- Use globals settings as default values for standard parameters, such as `alternative` or `confidence_level`. See the [reference](api/config.md#tea_tasting.config.config_context) for the full list of standard parameters. You can also define and use your own global parameters.
- Use global settings as default values for standard parameters, such as `alternative` or `confidence_level`. See the [reference](api/config.md#tea_tasting.config.config_context) for the full list of standard parameters. You can also define and use your own global parameters.
70 changes: 43 additions & 27 deletions docs/data-backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ First, let's prepare a demo database:

```python
import ibis
import polars as pl
import tea_tasting as tt


Expand All @@ -35,7 +36,7 @@ con = ibis.duckdb.connect()
con.create_table("users_data", users_data)
#> DatabaseTable: memory.main.users_data
#> user int64
#> variant uint8
#> variant int64
#> sessions int64
#> orders int64
#> revenue float64
Expand All @@ -51,7 +52,7 @@ See the [Ibis documentation on how to create connections](https://ibis-project.o

## Querying experimental data

Method `con.create_table` in the example above returns an instance of Ibis Table which already can be used in the analysis of the experiment. But let's see how to use an SQL query to create Ibis Table:
Method `con.create_table` in the example above returns an Ibis Table which already can be used in the analysis of the experiment. But let's see how to use an SQL query to create an Ibis Table:

```python
data = con.sql("select * from users_data")
Expand All @@ -61,30 +62,39 @@ print(data)
#> select * from users_data
#> schema:
#> user int64
#> variant uint8
#> variant int64
#> sessions int64
#> orders int64
#> revenue float64
```

It's a very simple query. In real world, you might need to use joins, aggregations, and CTEs to get the data. You can define any SQL query supported by your data backend and use it to create Ibis Table.
It's a very simple query. In the real world, you might need to use joins, aggregations, and CTEs to get the data. You can define any SQL query supported by your data backend and use it to create Ibis Table.

Keep in mind that **tea-tasting** assumes that:

- Data is grouped by randomization units, such as individual users.
- There is a column indicating variant of the A/B test (typically labeled as A, B, etc.).
- There is a column indicating the variant of the A/B test (typically labeled as A, B, etc.).
- All necessary columns for metric calculations (like the number of orders, revenue, etc.) are included in the table.

Ibis Table is a lazy object. It doesn't fetch the data when created. You can use Ibis DataFrame API to query the table and fetch the result:

```python
print(data.head(5).to_pandas())
#> user variant sessions orders revenue
#> 0 0 1 2 1 9.166147
#> 1 1 0 2 1 6.434079
#> 2 2 1 2 1 7.943873
#> 3 3 1 2 1 15.928675
#> 4 4 0 1 1 7.136917
with pl.Config(
float_precision=5,
tbl_cell_alignment="RIGHT",
tbl_formatting="NOTHING",
trim_decimal_zeros=False,
):
print(data.head(5).to_polars())
#> shape: (5, 5)
#> user variant sessions orders revenue
#> --- --- --- --- ---
#> i64 i64 i64 i64 f64
#> 0 1 2 1 9.16615
#> 1 0 2 1 6.43408
#> 2 1 2 1 7.94387
#> 3 1 2 1 15.92867
#> 4 0 1 1 7.13692
```

## Ibis example
Expand All @@ -104,7 +114,7 @@ print(aggr_data)
#> select * from users_data
#> schema:
#> user int64
#> variant uint8
#> variant int64
#> sessions int64
#> orders int64
#> revenue float64
Expand All @@ -122,10 +132,19 @@ print(aggr_data)
`aggr_data` is another Ibis Table defined as a query over the previously defined `data`. Let's fetch the result:

```python
print(aggr_data.to_pandas())
#> variant sessions_per_user orders_per_session orders_per_user revenue_per_user
#> 0 0 1.996045 0.265726 0.530400 5.241079
#> 1 1 1.982802 0.289031 0.573091 5.730132
with pl.Config(
float_precision=5,
tbl_cell_alignment="RIGHT",
tbl_formatting="NOTHING",
trim_decimal_zeros=False,
):
print(aggr_data.to_polars())
#> shape: (2, 5)
#> variant sessions_per_user orders_per_session orders_per_user revenue_per_user
#> --- --- --- --- ---
#> i64 f64 f64 f64 f64
#> 0 1.99605 0.26573 0.53040 5.24108
#> 1 1.98280 0.28903 0.57309 5.73013
```

Internally, Ibis compiles a Table to an SQL query supported by the backend:
Expand All @@ -151,7 +170,7 @@ See [Ibis documentation](https://ibis-project.org/tutorials/getting_started) for

## Experiment analysis

The example above shows how to query the metric averages. But for statistical inference it's not enough. For example, Student's t-test and Z-test also require number of rows and variance. And analysis of ratio metrics and variance reduction with CUPED require covariances.
The example above shows how to query the metric averages. But for statistical inference, it's not enough. For example, Student's t-test and Z-test also require number of rows and variance. Additionally, analysis of ratio metrics and variance reduction with CUPED requires covariances.

Querying all the required statistics manually can be a daunting and error-prone task. But don't worry—**tea-tasting** does this work for you. You just need to specify the metrics:

Expand All @@ -171,9 +190,9 @@ print(result)
#> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123
```

In the example above, **tea-tasting** fetches all the required statistics with a single query and then uses them to analyse the experiment.
In the example above, **tea-tasting** fetches all the required statistics with a single query and then uses them to analyze the experiment.

Some statistical methods, like Bootstrap, require granular data for the analysis. In this case, **tea-tasting** fetches the detailed data as well.
Some statistical methods, like bootstrap, require granular data for analysis. In this case, **tea-tasting** fetches the detailed data as well.

## Example with CUPED

Expand All @@ -184,7 +203,7 @@ users_data_with_cov = tt.make_users_data(seed=42, covariates=True)
con.create_table("users_data_with_cov", users_data_with_cov)
#> DatabaseTable: memory.main.users_data_with_cov
#> user int64
#> variant uint8
#> variant int64
#> sessions int64
#> orders int64
#> revenue float64
Expand Down Expand Up @@ -215,14 +234,11 @@ print(result_with_cov)

## Polars example

An example of analysis using a Polars DataFrame as input data:
Here’s an example of how to analyze data using a Polars DataFrame:

```python
import polars as pl


polars_data = pl.from_pandas(users_data)
print(experiment.analyze(polars_data))
data_polars = pl.from_arrow(users_data)
print(experiment.analyze(data_polars))
#> metric control treatment rel_effect_size rel_effect_size_ci pvalue
#> sessions_per_user 2.00 1.98 -0.66% [-3.7%, 2.5%] 0.674
#> orders_per_session 0.266 0.289 8.8% [-0.89%, 19%] 0.0762
Expand Down
7 changes: 3 additions & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@

**tea-tasting** is a Python package for the statistical analysis of A/B tests featuring:

- Student's t-test, Z-test, Bootstrap, and quantile metrics out of the box.
- Student's t-test, Z-test, bootstrap, and quantile metrics out of the box.
- Extensible API: define and use statistical tests of your choice.
- [Delta method](https://alexdeng.github.io/public/files/kdd2018-dm.pdf) for ratio metrics.
- Variance reduction with [CUPED](https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf)/[CUPAC](https://doordash.engineering/2020/06/08/improving-experimental-power-through-control-using-predictions-as-covariate-cupac/) (also in combination with the delta method for ratio metrics).
- Confidence intervals for both absolute and percentage change.
- Variance reduction using [CUPED](https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf)/[CUPAC](https://doordash.engineering/2020/06/08/improving-experimental-power-through-control-using-predictions-as-covariate-cupac/) (which can also be combined with the delta method for ratio metrics).
- Confidence intervals for both absolute and percentage changes.
- Sample ratio mismatch check.
- Power analysis.
- Multiple hypothesis testing (family-wise error rate and false discovery rate).
Expand Down Expand Up @@ -56,7 +56,6 @@ Learn more in the detailed [user guide](https://tea-tasting.e10v.me/user-guide/)

## Roadmap

- Switch from Pandas DataFrames to PyArrow Tables for internal data. Make Pandas dependency optional.
- A/A tests and simulations.
- More statistical tests:
- Asymptotic and exact tests for frequency data.
Expand Down
Loading

0 comments on commit 7450313

Please sign in to comment.