Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document dbtRunner (programmatic invocation) #3118

Merged
merged 11 commits into from
Apr 13, 2023
6 changes: 5 additions & 1 deletion website/dbt-versions.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
exports.versions = [
{
version: "1.5",
EOLDate: "2024-04-26",
EOLDate: "2024-04-27",
isPrerelease: true,
},
{
Expand All @@ -27,6 +27,10 @@ exports.versions = [
]

exports.versionedPages = [
{
"page": "docs/reference/programmatic-invocations",
"firstVersion": "1.5",
},
{
"page": "docs/collaborate/publish/model-contracts",
"firstVersion": "1.5",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,5 +75,5 @@ More to come!
- [Model access](model-access)
- [Model versions](model-versions)

### dbt-core Python API
- Auto-generated documentation ([#2674](https://github.com/dbt-labs/docs.getdbt.com/issues/2674)) for dbt-core CLI & Python API for programmatic invocations
### New CLI, commands, Python API for programmatic invocations
- [Programmatic invocations](programmatic-invocations)
8 changes: 7 additions & 1 deletion website/docs/reference/events-logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,4 +190,10 @@ The `EVENT_HISTORY` object has been deprecated and removed in dbt Core v1.4+

Older versions of `dbt-core` made available a full history of events fired during an invocation, in the form of an `EVENT_HISTORY` object.

The Python interface into events is significantly less mature than the structured logging interface. For all use cases, we recommend parsing JSON-formatted logs.
<VersionBlock firstVersion="1.5">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL, neat!


When [invoking dbt programmatically](programmatic-invocations#registering-callbacks), it is possible to register a callback on dbt's `EventManager`. This allows access to structured events as Python objects, to enable custom logging and integration with other systems.

</VersionBlock>

The Python interface into events is significantly less mature than the structured logging interface. For all standard use cases, we recommend parsing JSON-formatted logs.
105 changes: 105 additions & 0 deletions website/docs/reference/programmatic-invocations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
title: "Programmatic invocations"
---

In v1.5, dbt-core added support for programmatic invocations. The intent is to expose the existing dbt CLI via a Python entry point, such that top-level commands are callable from within a Python script or application.

The entry point is a `dbtRunner` class, which allows you to `invoke` the same commands as on the CLI.

```python
from dbt.cli.main import dbtRunner, dbtRunnerResult

# initialize
dbt = dbtRunner()

# create CLI args as a list of strings
cli_args = ["run", "--select", "tag:my_tag"]

# run the command
res: dbtRunnerResult = dbt.invoke(cli_args)

# inspect the results
for r in res.result:
print(f"{r.node.name}: {r.status}")
```
dbeatty10 marked this conversation as resolved.
Show resolved Hide resolved

## `dbtRunnerResult`

Each command returns a `dbtRunnerResult` object, which has three attributes:
- `success` (bool): Whether the command succeeded.
- `result`: If the command completed (successfully or with handled errors), its result(s). Return type varies by command.
- `exception`: If the dbt invocation encountered an unhandled error and did not complete, the exception it encountered.

There is a 1:1 correspondence between [CLI exit codes](reference/exit-codes) and the `dbtRunnerResult` returned by a programmatic invocation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this in a table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aranke I think the table will be covered by the link.

i.e., [CLI exit codes](reference/exit-codes) will link to this page which has the following table:

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of tabling it up here as well!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's copy the same table here; most people aren't going to click on links.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, it's a different table – here the columns should be: scenario, success, exit_code, result, exception.


| Scenario | CLI Exit Code | `success` | `result` | `exception` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: numbers should be right-aligned using the |---:| header.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL!

|---------------------------------------------------------------------------------------------|--------------:|-----------|-------------------|-------------|
| Invocation completed without error | 0 | `True` | varies by command | `None` |
| Invocation completed with at least one handled error (e.g. test failure, model build error) | 1 | `False` | varies by command | `None` |
| Unhandled error. Invocation did not complete, and returns no results. | 2 | `False` | `None` | Exception |

## Commitments & Caveats

From dbt Core v1.5 onward, we making an ongoing commitment to providing a Python entry point at functional parity with dbt-core's CLI. We reserve the right to change the underlying implementation used to achieve that goal. We expect that the current implementation will unlock real use cases, in the short & medium term, while we work on a set of stable, long-term interfaces that will ultimately replace it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this language, ty.


In particular, the objects returned by each command in `dbtRunnerResult.result` are not fully contracted, and therefore liable to change. Some of the returned objects are partially documented, because they overlap in part with the contents of [dbt artifacts](dbt-artifacts). As Python objects, they contain many more fields and methods than what's available in the serialized JSON artifacts. These additional fields and methods should be considered **internal and liable to change in future versions of dbt-core.**

## Advanced usage patterns

:::caution
The syntax and support for these patterns are liable to change in future versions of `dbt-core`.
:::

The goal of `dbtRunner` is to offer parity with CLI workflows, within a programmatic environment. There are a few advanced usage patterns that extend what's possible with the CLI.

### Reusing objects

Pass pre-constructed objects into `dbtRunner`, to avoid recreating those objects by reading files from disk. Currently, the only object supported is the `Manifest` (project contents).

```python
from dbt.cli.main import dbtRunner, dbtRunnerResult
from dbt.contracts.graph.manifest import Manifest

# use 'parse' command to load a Manifest
res: dbtRunnerResult = dbtRunner().invoke(["parse"])
manifest: Manifest = res.result

# introspect manifest
# e.g. assert every public model has a description
for node in manifest.nodes.values():
if node.resource_type == "model" and node.access == "public":
assert node.description != "", f"{node.name} is missing a description"

# reuse this manifest in subsequent commands to skip parsing
dbt = dbtRunner(manifest=manifest)
cli_args = ["run", "--select", "tag:my_tag"]
res = dbt.invoke(cli_args)
```

### Registering callbacks

Register `callbacks` on dbt's `EventManager`, to access structured events and enable custom logging. The current behavior of callbacks is to block subsequent steps from proceeding; this functionality is not guaranteed in future versions.

```python
from dbt.cli.main import dbtRunner
from dbt.events.base_types import EventMsg

def print_version_callback(event: EventMsg):
if event.info.name == "MainReportVersion":
print(f"We are thrilled to be running dbt{event.data.version}")

dbt = dbtRunner(callbacks=[print_version_callback])
dbt.invoke(["list"])
```
dbeatty10 marked this conversation as resolved.
Show resolved Hide resolved

### Overriding parameters

Pass in parameters as keyword arguments, instead of a list of CLI-style strings. At present, dbt will not do any validation or type coercion on your inputs. The subcommand must be specified, in a list, as the first positional argument.
```python
from dbt.cli.main import dbtRunner
dbt = dbtRunner()

# these are equivalent
dbt.invoke(["--fail-fast", "run", "--select", "tag:my_tag"])
dbt.invoke(["run"], select=["tag:my_tag"], fail_fast=True)
```
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -613,6 +613,7 @@ const sidebarSettings = {
"reference/events-logging",
"reference/exit-codes",
"reference/parsing",
"reference/programmatic-invocations",
],
},
{
Expand Down