Replies: 1 comment 3 replies
-
I work in the fal tool I think this is great work towards building letting other tools to build on top of dbt. Really appreciate the effort! I have a questio regarding
What this means is someone using the sdk would not be able to modify the DAG to be run, right? dbt would just have the dbt project itself (the refs inside of a model, etc) as the reference on how to build said DAG. If so, I would agree this is the right approach. I wonder about adding "dynamic" nodes to the DAG, maybe just by adding models with refs (and adding extra refs to existing models) and then dbt building and running the resulting DAG. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
Historically,
dbt-core
has supported two ways of providing input: files containing “dbt code,” and CLI commands to execute them. Today, these are its exclusive officially supported APIs.Over the past few months, the Core team has been working on a programmatic (Python) entry-point for
dbt-core
that looks & feels like the CLI. It will be possible to call top-level dbt commands from a Python program, and pass in parameters, at parity with what you can do with dbt on the command line. We’ll be releasing the first cut of this API for programmatic invocations in v1.5 next month.Through that process, we’ve discussed what a second programmatic interface to dbt beyond the CLI could feel like—something that goes beyond what the CLI is capable of.
Why now? Why not before? There are tangible risks in evolving dbt-core from an opinionated workflow into a more-flexible library — we risk exposing the wrong interfaces, or removing guardrails that have been put in place for good reason. We haven’t seen a demonstrated need for it either, apart from requests from “superusers” in the community.
A few reasons that emerged from our discussion, about why now is the right time to start working on a second programmatic interface for
dbt-core
:manifest.json
orrun_results.json
, and real-time information available in structured logging—these aren’t flexible enough to support future growth. Reading a hundred MB JSON file at the end of a run is inherently less scalable than incrementally yielding a few KB when a node finishes running.dbt-core
2, and cannot be written in “dbt code”—it requires more power than what’s currently available via packages.First, we’re interested in exposing more flexible inputs for documented & contracted interfaces, which could support providing them as Python objects, in addition to source (SQL/Python) and config (YAML) file contents. This would include:
We’re interested in unbundling large monolithic objects, such as “manifest” and “runtime configuration,” in favor of smaller composable pieces where possible. For example, yielding each node’s result as it finishes running.
Over time, we’re interested in potentially exposing new interfaces, as polymorphic Python objects:
As part of this effort, the major interface we do not plan to expose in its entirety is the DAG. There is more power in declaratively defining what nodes dbt is running, versus imperatively telling dbt (or another tool) how to run them. This is a similar pattern to frameworks like React that allow users to define their components but manages how those components are run.
Ask from you, the reader:
Thanks, Kshitij (with help from @jtcohen6)
Footnotes
See Who Let the DAGs out by Tim Leonard of Flyte as a representative example that has me both impressed and horrified. ↩
Broad strokes, I’m thinking about whether the functionality is useful to 80%-90% of dbt users. ↩
Beta Was this translation helpful? Give feedback.
All reactions