Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1225] [Spike] Ibis on an existing adapter #5904

Closed
lostmygithubaccount opened this issue Sep 21, 2022 · 2 comments
Closed

[CT-1225] [Spike] Ibis on an existing adapter #5904

lostmygithubaccount opened this issue Sep 21, 2022 · 2 comments

Comments

@lostmygithubaccount
Copy link
Contributor

Spike to try out Ibis on an existing adapter

The purpose of this issue is to track a spike to try Ibis on an existing adapter supported by dbt.

Why?

Python is a lingua franca for data. The PyData ecosystem is the driving force behind that.

Pandas is the defacto standard for data transformation in Python. It has a number of well-known limitations, especially around performance and scalability. Spark and S[now]park both solve those problems decently but, in my opinion, also fracture the ecosystem into 3+ APIs.

Ibis, aiming to be a Python standard across all backends that follows principles of SQL select statements, is an appealing vendor-neutral alternative. dbt should evaluate how easy it is to use within our tooling and what work would be needed for us to be able to recommend to our users to standardize on it.

What?

Ibis supports a number of backends, including:

  • Spark
  • BigQuery
  • DuckDB

See acceptance criteria for details. Ideally, the outcome of this spike would cover:

  1. A local dbt + DuckDB + Ibis demo others can run to get familiar
  2. Proof-of-concept code for working with Ibis in dbt Python models in either Databricks or BigQuery
  3. Knowledge sharing on Ibis (without everyone digging through docs/code themselves)

Additional reading

Acceptance criteria

See the "What?" section above.

1. A quick demo other can try

I highly recommend using dbt + DuckDB + Ibis on jaffle_shop or similar. The dev should gain an understanding of DuckDB, Ibis, and how to plug those into dbt. PM/DX/SA/others in dbt should be able to replicate this and take it into a publicly presentable state, if not already.

2. Proof-of-concept for Ibis as transformation code in dbt Python models

Pick one of the existing backends and:

  • a writeup (or code if possible) of how Ibis would connect via dbt to the backend
  • a writeup (or code if possible) of options for where Ibis would run
    • ignoring dbt principles, could/should we just run Ibis locally? what do we gain by shipping up the code to the backend?
    • could/should we take what Ibis compiles down to?
  • mockup (or actual if possible) code for running this on an existing backend

3. Knowledge sharing on Ibis

In addition to the above, thoughts/code on:

  • is Ibis good? what're your opinions?
  • a deep dive on Ibis's tech and architecture
    • authentication, security, etc.
    • performance -- is this actually scalable? is there some fixed overhead cost?
    • their codebase -- how easy is it to contribute to? does it seem maintainable?
      • as another OSS data package, can we learn anything from them?
  • anything else discovered along the way
@github-actions github-actions bot changed the title [Spike] Ibis on an existing adapter [CT-1225] [Spike] Ibis on an existing adapter Sep 21, 2022
@lostmygithubaccount lostmygithubaccount self-assigned this Nov 1, 2022
@lostmygithubaccount
Copy link
Contributor Author

see #6296

@sfc-gh-twhite
Copy link

see #6296

Following the chain, this is pretty cool! Awesome stuff.

In case anyone lands here looking for an integration of dbt and ibis, you might be interested in dbt-ibis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants