Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clone sources #9550

Closed
3 tasks done
dbrtly opened this issue Feb 8, 2024 · 7 comments
Closed
3 tasks done

Clone sources #9550

dbrtly opened this issue Feb 8, 2024 · 7 comments
Labels
clone related to the dbt clone command enhancement New feature or request wontfix Not a bug or out of scope for dbt-core

Comments

@dbrtly
Copy link
Contributor

dbrtly commented Feb 8, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

dbt clone \ —target ci_env \ —select +modified \ —resource-type source \ —state target=prod

I want to clone sources when they exist upstream of the modified models.

Describe alternatives you've considered

Clone less specifically (example all the raw layer)
Hardcode the project in source configuration (not offically supported, iam ramifications)

Who will this benefit?

For model testing we want to validate against data in prod gcp project from the test gcp project.

Are you interested in contributing this feature?

Yep

Anything else?

I might need help with the testing.

@dbrtly dbrtly added enhancement New feature or request triage labels Feb 8, 2024
@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-bigquery Feb 10, 2024
@dbeatty10 dbeatty10 added the clone related to the dbt clone command label Feb 12, 2024
@dbeatty10
Copy link
Contributor

Thanks for opening this @dbrtly !

Can you share more details about the use-case(s) you are trying to solve for?

Maybe you have a PR that made code changes to a model, and you're trying to check if it produces the same data output or not?

@dbrtly
Copy link
Contributor Author

dbrtly commented Feb 14, 2024 via email

@dbeatty10
Copy link
Contributor

What is the end goal of the cloning step for sources? Is it to guarantee both environments are using the same exact input copy of the data? Is it to "freeze" the source data so that it can't change while running CI?

For continuous integration (CI) use cases, we recommend cloning incremental models as the first step of your CI job (only for warehouses that support zero copy cloning). After that, we recommend to defer to the production environment (rather than cloning).

Is there some reason that using --defer doesn't work for you?

Because of where sources sit in the DAG, they are "off limits" for creating database objects -- they are read-only references to data rather than being editable.

@dbrtly
Copy link
Contributor Author

dbrtly commented Feb 16, 2024

We have found --defer to be buggy. It mostly works but when it stops most of the team is unsure how to debug it.

I end up dropping everything else to do an emergency debug and fix. It impacts on the credibility of automated tests. Our github notifications scream about the mysterious failed tests. It's tiring for me.

@dbeatty10
Copy link
Contributor

Summary

dbt clone is restricted only to nodes within the DAG that dbt actually builds.

Since dbt only references sources and doesn't build them, it would be inconsistent (and potentially problematic) for us to clone them. So I'm going to close this issue as "not planned".

Follow-up questions about --defer

@dbrtly based on your experience, do you think there are bugs with --defer that we can reproduce and fix within dbt-core?

Or is its behavior unintuitive because it relies heavily on which objects do (or don't) exist within your current environment? (See below for explanations from our documentation about --defer.)

If it's truly a bug, would you be willing to open up bug reports as those occur? I'm not seeing anything outstanding here that looks like what you are describing.

Behavior of --defer

Here's the section of the documentation the explains some of the tricky bits:

When the --defer flag is provided, dbt will resolve ref calls differently depending on two criteria:

  1. Is the referenced node included in the model selection criteria of the current run?
  2. Does the reference node exist as a database object in the current environment?

If the answer to both is no—a node is not included and it does not exist as a database object in the current environment—references to it will use the other namespace instead, provided by the state manifest.

Ephemeral models are never deferred, since they serve as "passthroughs" for other ref calls.

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 16, 2024
@dbeatty10 dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Feb 16, 2024
@dbrtly
Copy link
Contributor Author

dbrtly commented Feb 18, 2024

The developers/mgt has typically classed the test automation as broken when “It worked in dev”.

The tough edges with the defer arguments have also been related to sources. We test our models in a different database than prod but sometimes the sources are in the same database and sometimes not. Getting precise config for all the sources has been a journey.

There have also been permission issues with the service account in the test database having access to sources (particularly external sources). Cloning everything is relatively fast, cheap and easier as a brute force high-level validation that the tests are ready to delegate to the automation.

@dbeatty10
Copy link
Contributor

Thanks for sharing more information about the situations you've run into @dbrtly 🧠

Even if it were possible to clone sources, you'd still need to sort out any permissions issues.

Neither of your situations sound like bugs with --defer, but please do raise them if you run into any in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clone related to the dbt clone command enhancement New feature or request wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

2 participants