Clone sources #9550

dbrtly · 2024-02-08T04:40:36Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

dbt clone \ —target ci_env \ —select +modified \ —resource-type source \ —state target=prod

I want to clone sources when they exist upstream of the modified models.

Describe alternatives you've considered

Clone less specifically (example all the raw layer)
Hardcode the project in source configuration (not offically supported, iam ramifications)

Who will this benefit?

For model testing we want to validate against data in prod gcp project from the test gcp project.

Are you interested in contributing this feature?

Yep

Anything else?

I might need help with the testing.

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2024-02-13T20:16:35Z

Thanks for opening this @dbrtly !

Can you share more details about the use-case(s) you are trying to solve for?

Maybe you have a PR that made code changes to a model, and you're trying to check if it produces the same data output or not?

dbrtly · 2024-02-14T00:37:43Z

Yes exactly. Currently, we purge bigquery, the arrange the environment with: * clone state:modified * clone + state:modified —resource-type table * clone + state:modified —resource-type incremental * run clone +state:modified —resource-type view * seed +state:modified But that still misses sources, a dbt command would be like the others. A command that simplified all that would be even better: `dbt clone --target test --ci-arrange --state target=prod` Thanks, Daniel

…

________________________________ From: Doug Beatty ***@***.***> Sent: Wednesday, February 14, 2024 7:16:47 AM To: dbt-labs/dbt-core ***@***.***> Cc: Daniel Bartley ***@***.***>; Mention ***@***.***> Subject: Re: [dbt-labs/dbt-core] Clone sources (Issue #9550) Thanks for opening this @dbrtly<https://github.com/dbrtly> ! Can you share more details about the use-case(s) you are trying to solve for? Maybe you have a PR that made code changes to a model, and you're trying to check if it produces the same data output or not? — Reply to this email directly, view it on GitHub<#9550 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLXPTMY4VHCF3KR6QAQJQ3YTPC27AVCNFSM6AAAAABDCLTHMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSGM3TSMJYGI>. You are receiving this because you were mentioned.Message ID: ***@***.***>

dbeatty10 · 2024-02-14T04:34:06Z

What is the end goal of the cloning step for sources? Is it to guarantee both environments are using the same exact input copy of the data? Is it to "freeze" the source data so that it can't change while running CI?

For continuous integration (CI) use cases, we recommend cloning incremental models as the first step of your CI job (only for warehouses that support zero copy cloning). After that, we recommend to defer to the production environment (rather than cloning).

Is there some reason that using --defer doesn't work for you?

Because of where sources sit in the DAG, they are "off limits" for creating database objects -- they are read-only references to data rather than being editable.

dbrtly · 2024-02-16T07:54:42Z

We have found --defer to be buggy. It mostly works but when it stops most of the team is unsure how to debug it.

I end up dropping everything else to do an emergency debug and fix. It impacts on the credibility of automated tests. Our github notifications scream about the mysterious failed tests. It's tiring for me.

dbeatty10 · 2024-02-16T17:23:19Z

Summary

dbt clone is restricted only to nodes within the DAG that dbt actually builds.

Since dbt only references sources and doesn't build them, it would be inconsistent (and potentially problematic) for us to clone them. So I'm going to close this issue as "not planned".

Follow-up questions about `--defer`

@dbrtly based on your experience, do you think there are bugs with --defer that we can reproduce and fix within dbt-core?

Or is its behavior unintuitive because it relies heavily on which objects do (or don't) exist within your current environment? (See below for explanations from our documentation about --defer.)

If it's truly a bug, would you be willing to open up bug reports as those occur? I'm not seeing anything outstanding here that looks like what you are describing.

Behavior of `--defer`

Here's the section of the documentation the explains some of the tricky bits:

When the --defer flag is provided, dbt will resolve ref calls differently depending on two criteria:

Is the referenced node included in the model selection criteria of the current run?

Does the reference node exist as a database object in the current environment?

If the answer to both is no—a node is not included and it does not exist as a database object in the current environment—references to it will use the other namespace instead, provided by the state manifest.

Ephemeral models are never deferred, since they serve as "passthroughs" for other ref calls.

dbrtly · 2024-02-18T05:46:49Z

The developers/mgt has typically classed the test automation as broken when “It worked in dev”.

The tough edges with the defer arguments have also been related to sources. We test our models in a different database than prod but sometimes the sources are in the same database and sometimes not. Getting precise config for all the sources has been a journey.

There have also been permission issues with the service account in the test database having access to sources (particularly external sources). Cloning everything is relatively fast, cheap and easier as a brute force high-level validation that the tests are ready to delegate to the automation.

dbeatty10 · 2024-02-19T18:17:03Z

Thanks for sharing more information about the situations you've run into @dbrtly 🧠

Even if it were possible to clone sources, you'd still need to sort out any permissions issues.

Neither of your situations sound like bugs with --defer, but please do raise them if you run into any in the future.

dbrtly added enhancement New feature or request triage labels Feb 8, 2024

dbeatty10 transferred this issue from dbt-labs/dbt-bigquery Feb 10, 2024

dbeatty10 added the clone related to the dbt clone command label Feb 12, 2024

dbeatty10 added awaiting_response and removed triage labels Feb 13, 2024

github-actions bot added triage and removed awaiting_response labels Feb 14, 2024

dbeatty10 mentioned this issue Feb 14, 2024

Call out which resource types are cloned or not dbt-labs/docs.getdbt.com#4908

Open

1 task

dbeatty10 added awaiting_response and removed triage labels Feb 14, 2024

github-actions bot added triage and removed awaiting_response labels Feb 16, 2024

dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 16, 2024

dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Feb 16, 2024

dbeatty10 mentioned this issue Feb 19, 2024

[Feature] Be able to --favor-state for sources #9599

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clone sources #9550

Clone sources #9550

dbrtly commented Feb 8, 2024

dbeatty10 commented Feb 13, 2024

dbrtly commented Feb 14, 2024 via email

dbeatty10 commented Feb 14, 2024

dbrtly commented Feb 16, 2024

dbeatty10 commented Feb 16, 2024

dbrtly commented Feb 18, 2024

dbeatty10 commented Feb 19, 2024

Clone sources #9550

Clone sources #9550

Comments

dbrtly commented Feb 8, 2024

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

dbeatty10 commented Feb 13, 2024

dbrtly commented Feb 14, 2024 via email

dbeatty10 commented Feb 14, 2024

dbrtly commented Feb 16, 2024

dbeatty10 commented Feb 16, 2024

Summary

Follow-up questions about --defer

Behavior of --defer

dbrtly commented Feb 18, 2024

dbeatty10 commented Feb 19, 2024

Follow-up questions about `--defer`

Behavior of `--defer`