You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this issue is to track a spike to try Ibis on an existing adapter supported by dbt.
Why?
Python is a lingua franca for data. The PyData ecosystem is the driving force behind that.
Pandas is the defacto standard for data transformation in Python. It has a number of well-known limitations, especially around performance and scalability. Spark and S[now]park both solve those problems decently but, in my opinion, also fracture the ecosystem into 3+ APIs.
Ibis, aiming to be a Python standard across all backends that follows principles of SQL select statements, is an appealing vendor-neutral alternative. dbt should evaluate how easy it is to use within our tooling and what work would be needed for us to be able to recommend to our users to standardize on it.
What?
Ibis supports a number of backends, including:
Spark
BigQuery
DuckDB
See acceptance criteria for details. Ideally, the outcome of this spike would cover:
A local dbt + DuckDB + Ibis demo others can run to get familiar
Proof-of-concept code for working with Ibis in dbt Python models in either Databricks or BigQuery
Knowledge sharing on Ibis (without everyone digging through docs/code themselves)
I highly recommend using dbt + DuckDB + Ibis on jaffle_shop or similar. The dev should gain an understanding of DuckDB, Ibis, and how to plug those into dbt. PM/DX/SA/others in dbt should be able to replicate this and take it into a publicly presentable state, if not already.
2. Proof-of-concept for Ibis as transformation code in dbt Python models
Pick one of the existing backends and:
a writeup (or code if possible) of how Ibis would connect via dbt to the backend
a writeup (or code if possible) of options for where Ibis would run
ignoring dbt principles, could/should we just run Ibis locally? what do we gain by shipping up the code to the backend?
could/should we take what Ibis compiles down to?
mockup (or actual if possible) code for running this on an existing backend
3. Knowledge sharing on Ibis
In addition to the above, thoughts/code on:
is Ibis good? what're your opinions?
a deep dive on Ibis's tech and architecture
authentication, security, etc.
performance -- is this actually scalable? is there some fixed overhead cost?
their codebase -- how easy is it to contribute to? does it seem maintainable?
as another OSS data package, can we learn anything from them?
anything else discovered along the way
The text was updated successfully, but these errors were encountered:
github-actionsbot
changed the title
[Spike] Ibis on an existing adapter
[CT-1225] [Spike] Ibis on an existing adapter
Sep 21, 2022
Spike to try out Ibis on an existing adapter
The purpose of this issue is to track a spike to try Ibis on an existing adapter supported by dbt.
Why?
Python is a lingua franca for data. The PyData ecosystem is the driving force behind that.
Pandas is the defacto standard for data transformation in Python. It has a number of well-known limitations, especially around performance and scalability. Spark and S[now]park both solve those problems decently but, in my opinion, also fracture the ecosystem into 3+ APIs.
Ibis, aiming to be a Python standard across all backends that follows principles of SQL
select
statements, is an appealing vendor-neutral alternative. dbt should evaluate how easy it is to use within our tooling and what work would be needed for us to be able to recommend to our users to standardize on it.What?
Ibis supports a number of backends, including:
See acceptance criteria for details. Ideally, the outcome of this spike would cover:
Additional reading
Acceptance criteria
See the "What?" section above.
1. A quick demo other can try
I highly recommend using dbt + DuckDB + Ibis on jaffle_shop or similar. The dev should gain an understanding of DuckDB, Ibis, and how to plug those into dbt. PM/DX/SA/others in dbt should be able to replicate this and take it into a publicly presentable state, if not already.
2. Proof-of-concept for Ibis as transformation code in dbt Python models
Pick one of the existing backends and:
3. Knowledge sharing on Ibis
In addition to the above, thoughts/code on:
The text was updated successfully, but these errors were encountered: