Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs #105

Merged
merged 1 commit into from
Dec 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions docs/data-backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,25 @@

## Intro

**tea-tasting** supports a wide range of data backends such as BigQuery, ClickHouse, PostgreSQL/GreenPlum, Snowflake, Spark, and 20+ other backends supported by [Ibis](https://ibis-project.org/). Ibis is a Python package that serves as a DataFrame API to various data backends.
**tea-tasting** supports a wide range of data backends such as BigQuery, ClickHouse, DuckDB, PostgreSQL, Snowflake, Spark, and many other backends supported by [Ibis](https://github.com/ibis-project/ibis). Ibis is a DataFrame API to various data backends.

Many statistical tests, such as the Student's t-test or the Z-test, require only aggregated data for analysis. For these tests, **tea-tasting** retrieves only aggregated statistics like mean and variance instead of downloading all detailed data.

For example, if the raw experimental data are stored in ClickHouse, it's faster and more efficient to calculate counts, averages, variances, and covariances directly in ClickHouse rather than fetching granular data and performing aggregations in a Python environment.

**tea-tasting** also accepts dataframes supported by [Narwhals](https://github.com/narwhals-dev/narwhals): cuDF, Dask, Modin, pandas, Polars, PyArrow. Narwhals is a compatibility layer between dataframe libraries.

This guide:

- Shows how to use **tea-tasting** with a data backend of your choice for the analysis of an experiment.
- Explains some internals of how **tea-tasting** uses Ibis to work with data backends.

## Demo database

This guide uses [DuckDB](https://duckdb.org/), an in-process analytical database, as an example data backend. To be able to reproduce the example code, install both **tea-tasting** and Ibis with DuckDB extra:
This guide uses [DuckDB](https://github.com/duckdb/duckdb), an in-process analytical database, and [Polars](https://github.com/pola-rs/polars) as example data backends. To be able to reproduce the example code, install **tea-tasting**, Ibis with DuckDB extra, and Polars:

```bash
pip install tea-tasting ibis-framework[duckdb]
pip install tea-tasting ibis-framework[duckdb] polars
```

First, let's prepare a demo database:
Expand Down Expand Up @@ -210,3 +212,20 @@ print(result_with_cov)
#> orders_per_user 0.523 0.581 11% [2.9%, 20%] 0.00733
#> revenue_per_user 5.12 5.85 14% [3.8%, 26%] 0.00675
```

## Polars example

An example of analysis using a Polars DataFrame as input data:

```python
import polars as pl


polars_data = pl.from_pandas(users_data)
print(experiment.analyze(polars_data))
#> metric control treatment rel_effect_size rel_effect_size_ci pvalue
#> sessions_per_user 2.00 1.98 -0.66% [-3.7%, 2.5%] 0.674
#> orders_per_session 0.266 0.289 8.8% [-0.89%, 19%] 0.0762
#> orders_per_user 0.530 0.573 8.0% [-2.0%, 19%] 0.118
#> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123
```
3 changes: 1 addition & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
- Power analysis.
- Multiple hypothesis testing (family-wise error rate and false discovery rate).

**tea-tasting** calculates statistics directly within data backends such as BigQuery, ClickHouse, PostgreSQL, Snowflake, Spark, and 20+ other backends supported by [Ibis](https://ibis-project.org/). This approach eliminates the need to import granular data into a Python environment, though Pandas DataFrames are also supported.
**tea-tasting** calculates statistics directly within data backends such as BigQuery, ClickHouse, DuckDB, PostgreSQL, Snowflake, Spark, and many other backends supported by [Ibis](https://github.com/ibis-project/ibis) and [Narwhals](https://github.com/narwhals-dev/narwhals). This approach eliminates the need to import granular data into a Python environment. **tea-tasting** also accepts dataframes supported by [Narwhals](https://github.com/narwhals-dev/narwhals): cuDF, Dask, Modin, pandas, Polars, PyArrow.

Check out the [blog post](https://e10v.me/tea-tasting-analysis-of-experiments/) explaining the advantages of using **tea-tasting** for the analysis of A/B tests.

Expand Down Expand Up @@ -56,7 +56,6 @@ Learn more in the detailed [user guide](https://tea-tasting.e10v.me/user-guide/)

## Roadmap

- Support more dataframes with [Narwhals](https://github.com/narwhals-dev/narwhals).
- A/A tests and simulations.
- More statistical tests:
- Asymptotic and exact tests for frequency data.
Expand Down
5 changes: 4 additions & 1 deletion docs/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,10 @@ The [`make_users_data`](api/datasets.md#tea_tasting.datasets.make_users_data) fu
- `orders`: The total number of user's orders.
- `revenue`: The total revenue generated by the user.

**tea-tasting** can process data in the form of either a Pandas DataFrame or an Ibis Table. [Ibis](https://ibis-project.org/) is a Python package that serves as a DataFrame API to various data backends. It supports 20+ backends including BigQuery, ClickHouse, DuckDB, Polars, PostgreSQL, Snowflake, Spark etc. You can write an SQL query, [wrap](https://ibis-project.org/how-to/extending/sql#backend.sql) it as an Ibis Table and pass it to **tea-tasting**.
**tea-tasting** can process data in the form of an Ibis Table or a DataFrame supported by Narwhals:

- [Ibis](https://github.com/ibis-project/ibis) is a DataFrame API to various data backends. It supports many backends including BigQuery, ClickHouse, DuckDB, PostgreSQL, Snowflake, Spark etc. You can write an SQL query, [wrap](https://ibis-project.org/how-to/extending/sql#backend.sql) it as an Ibis Table and pass it to **tea-tasting**.
- [Narwhals](https://github.com/narwhals-dev/narwhals) is a compatibility layer between dataframe libraries. It supports cuDF, Dask, Modin, pandas, Polars, PyArrow dataframes. You can use any of these dataframes as an input to **tea-tasting**.

Many statistical tests, such as the Student's t-test or the Z-test, require only aggregated data for analysis. For these tests, **tea-tasting** retrieves only aggregated statistics like mean and variance instead of downloading all detailed data. See more details in the [guide on data backends](data-backends.md).

Expand Down