Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Be able to --favor-state for sources #9599

Open
3 tasks done
dbeatty10 opened this issue Feb 19, 2024 · 5 comments
Open
3 tasks done

[Feature] Be able to --favor-state for sources #9599

dbeatty10 opened this issue Feb 19, 2024 · 5 comments
Labels
enhancement New feature or request state Stateful selection (state:modified, defer)

Comments

@dbeatty10
Copy link
Contributor

dbeatty10 commented Feb 19, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

See "Anything else" below for example code.

It seems like there should be a way for --favor-state to utilize sources in addition to models, possibly via a --include-sources flag. To preserve current behavior, it would an opt-in with a default of false.

Alternatively, we could consider the current behavior a bug and make it opt-opt with a default of true. Or just skip creating a new flag and just change the behavior to include sources when --favor-state is included.

The decision above could apply to --state as well.

Describe alternatives you've considered

No response

Who will this benefit?

I tried out using --defer --favor-state --state while replying to #9550 (comment), and I discovered that it didn't apply to sources.

Are you interested in contributing this feature?

No response

Anything else?

models/_sources.yml

sources:
  - name: my_source
    database: db
    schema: "{{ target.schema }}"
    tables:
      - name: source_x

models/model_j.sql

-- model_j: {{ this }}
-- depends on:
--   nothing

select 1 as id

models/model_k.sql

-- model_k: {{ this }}
-- depends on:
--   model_a: {{ ref("model_j") }}
--   source_x: {{ source("my_source", "source_x") }}

select 1 as id

Assuming I have targets named dev and prod and I run the following:

dbt compile --target prod --target-path prod-run-artifacts
dbt compile --select model_k  --defer --favor-state --state prod-run-artifacts --target dev

I'd expect to get this:

-- model_k: "db"."feature_2"."model_k"
-- depends on:
--   model_a: "db"."prod"."model_j"
--   source_x: "db"."prod"."source_x"

select 1 as id

But I actually get this:

-- model_k: "db"."feature_2"."model_k"
-- depends on:
--   model_a: "db"."prod"."model_j"
--   source_x: "db"."feature_2"."source_x"

select 1 as id
@dbeatty10 dbeatty10 added enhancement New feature or request triage state Stateful selection (state:modified, defer) labels Feb 19, 2024
@dbeatty10
Copy link
Contributor Author

dbeatty10 commented Feb 24, 2024

@graciegoheen and I discussed this last week, and we were wondering:

  • How does --favor-state currently treat each of the non-model resource types that can be referenced (snapshots, seeds, and sources)?

Answer

Seeds and snapshots both utilize the state that is favored -- only sources do not.

My opinion

I think there should be a way for sources to also favor the state that was given (either as the default behavior or via an opt-in flag).

Reprex

This model gives us a peek at each of these:

models/model_k.sql

-- model_k: {{ this }}
-- depends on:
--   model_a: {{ ref("model_j") }}
--   seed_x: {{ ref("seed_x") }}
--   snapshot_x: {{ ref("snapshot_x") }}
--   source_x: {{ source("my_source", "source_x") }}

select 1 as id

Create some state and then favor it:

dbt compile --target prod --target-path prod-run-artifacts
dbt compile --select model_k  --defer --favor-state --state prod-run-artifacts --target dev

Output:

-- model_k: "db"."feature_456"."model_k"
-- depends on:
--   model_a: "db"."prod"."model_j"
--   seed_x: "db"."prod"."seed_x"
--   snapshot_x: "db"."prod"."snapshot_x"
--   source_x: "db"."feature_456"."source_x"

select 1 as id

So seeds and snapshots both utilize the state that is favored -- only snapshots do not.

@dbeatty10
Copy link
Contributor Author

dbeatty10 commented Feb 26, 2024

Backstory

From convo with @jtcohen6:

Currently, deferral only works on non-ephemeral, "refable" nodes. This does not include the source node type at this time:

Node Exists in database Refable Supports model versions
Source
Seed
Snapshot - timestamp
Snapshot - check
Model - incremental - append
Model - incremental - delete+insert
Model - incremental - merge
Model - incremental - insert_overwrite
Model - incremental - custom strategy
Model - table
Model - materialized view
Model - view
Model - ephemeral
Analysis

New behavior

Creating a definition of "deferable" that includes "refable" + sources would feel like more consistent behavior. Alternatively, we could consider adding sources to the refable list directly as long as we fully understand and accept those implications.

We wouldn't want jump all the way to use "exists in database" as the criteria because exported saved queries and results of data tests can both also exist in the database, but not in a way that can be referenced in the dbt DAG.

Opt-in

If there is any way this behavior change could either be "breaking" or an unpleasant surprise for someone's workflow, we'd want to utilize an opt-in mechanism for this behavior.

@dbeatty10 dbeatty10 removed the triage label Feb 26, 2024
@graciegoheen
Copy link
Contributor

Is this related #8727?

@dbeatty10
Copy link
Contributor Author

Is this related #8727?

I haven't looked at the implementation of state:modified / state:new to determine one way or the other.

@dbeatty10
Copy link
Contributor Author

@katieclaiborne-duet ran into this with a unit test during a slim CI build. The model being unit tested relies upon a source table that contains environment-aware logic similar to the code example in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request state Stateful selection (state:modified, defer)
Projects
None yet
Development

No branches or pull requests

2 participants