-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clone sources #9550
Comments
Thanks for opening this @dbrtly ! Can you share more details about the use-case(s) you are trying to solve for? Maybe you have a PR that made code changes to a model, and you're trying to check if it produces the same data output or not? |
Yes exactly.
Currently, we purge bigquery, the arrange the environment with:
* clone state:modified
* clone + state:modified —resource-type table
* clone + state:modified —resource-type incremental
* run clone +state:modified —resource-type view
* seed +state:modified
But that still misses sources, a dbt command would be like the others.
A command that simplified all that would be even better:
`dbt clone --target test --ci-arrange --state target=prod`
Thanks,
Daniel
…________________________________
From: Doug Beatty ***@***.***>
Sent: Wednesday, February 14, 2024 7:16:47 AM
To: dbt-labs/dbt-core ***@***.***>
Cc: Daniel Bartley ***@***.***>; Mention ***@***.***>
Subject: Re: [dbt-labs/dbt-core] Clone sources (Issue #9550)
Thanks for opening this @dbrtly<https://github.com/dbrtly> !
Can you share more details about the use-case(s) you are trying to solve for?
Maybe you have a PR that made code changes to a model, and you're trying to check if it produces the same data output or not?
—
Reply to this email directly, view it on GitHub<#9550 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLXPTMY4VHCF3KR6QAQJQ3YTPC27AVCNFSM6AAAAABDCLTHMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSGM3TSMJYGI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
What is the end goal of the cloning step for sources? Is it to guarantee both environments are using the same exact input copy of the data? Is it to "freeze" the source data so that it can't change while running CI? For continuous integration (CI) use cases, we recommend cloning incremental models as the first step of your CI job (only for warehouses that support zero copy cloning). After that, we recommend to defer to the production environment (rather than cloning). Is there some reason that using Because of where sources sit in the DAG, they are "off limits" for creating database objects -- they are read-only references to data rather than being editable. |
We have found I end up dropping everything else to do an emergency debug and fix. It impacts on the credibility of automated tests. Our github notifications scream about the mysterious failed tests. It's tiring for me. |
Summary
Since dbt only references sources and doesn't build them, it would be inconsistent (and potentially problematic) for us to clone them. So I'm going to close this issue as "not planned". Follow-up questions about
|
The developers/mgt has typically classed the test automation as broken when “It worked in dev”. The tough edges with the defer arguments have also been related to sources. We test our models in a different database than prod but sometimes the sources are in the same database and sometimes not. Getting precise config for all the sources has been a journey. There have also been permission issues with the service account in the test database having access to sources (particularly external sources). Cloning everything is relatively fast, cheap and easier as a brute force high-level validation that the tests are ready to delegate to the automation. |
Thanks for sharing more information about the situations you've run into @dbrtly 🧠 Even if it were possible to clone sources, you'd still need to sort out any permissions issues. Neither of your situations sound like bugs with |
Is this your first time submitting a feature request?
Describe the feature
dbt clone \ —target ci_env \ —select +modified \ —resource-type source \ —state target=prod
I want to clone sources when they exist upstream of the modified models.
Describe alternatives you've considered
Clone less specifically (example all the raw layer)
Hardcode the project in source configuration (not offically supported, iam ramifications)
Who will this benefit?
For model testing we want to validate against data in prod gcp project from the test gcp project.
Are you interested in contributing this feature?
Yep
Anything else?
I might need help with the testing.
The text was updated successfully, but these errors were encountered: