Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give dbt basic workflow capabilities #1842

Closed
jrandrews opened this issue Oct 19, 2019 · 6 comments
Closed

Give dbt basic workflow capabilities #1842

jrandrews opened this issue Oct 19, 2019 · 6 comments
Labels
enhancement New feature or request more_information_needed stale Issues that have gone stale

Comments

@jrandrews
Copy link

jrandrews commented Oct 19, 2019

Describe the feature

We need a way to more flexibly call different chains of dbt commands in different orders and schedules for a single dbt project while capturing the steps of these workflows along with the parameters, relationships between steps, etc. It would also be helpful to be able to store the configuration for these workflows and steps in the dbt project itself, so changes to the configuration for the orchestration can be versioned, controlled, managed, and deployed using git-based tools in the same way that everything else in dbt is.

Describe alternatives you've considered

We have been using bash scripts and Docker to capture this along with other enterprise workflow management software. We have also seen other dbt users use Airflow, Luigi, etc. All of these add significant overhead and complexity.

Additional context

Should not be database-specific.

We do need a way to more flexibly call different chains of dbt commands in potentially a different order for any given dbt project.
It would be helpful for developers on a given dbt project to be able to clearly see in git/AZDO somehow the given chains of dbt commands for any given dbt project. And also control/review/update/test these chains of commands using the same CI/CD process that we use for dbt models, macros, and tests. There are scenarios where on some projects we might want to do something like this, and the chain of commands, models, tests, and selectors can affect the logic of how the developer is writing additional models and tests so they need to really understand the flow of what is going on for any given project at any given time. Example chain of commands:

dbt clean
dbt deps
dbt run-operation {some-macro} --args {arg1}
dbt run-operation {some-other-macro} --args {arg2}
dbt seed
dbt source snapshot-freshness
dbt test --models source:*
dbt run --models tag:hourly
dbt test

It's likely that over time each project will have its own divergent set of dbt commands, tags, parameters, etc.
We also need a way to be able to call different dbt commands on different schedules. Most common case for this is being able to call dbt snapshot (along with perhaps a few tests, etc.) more often than other dbt commands. It would also be helpful to perhaps call dbt seed less often, even only on detecting that there has been a change in a seed file (although it's a pretty low-cost operation.)

Who will this benefit?

Developers and analytics users who will be able to clearly see the dbt workflow job chains and parameters right alongside their dbt models and code. Architects who can then worry less about having to build up other job orchestration infrastructure because dbt does not have these capabilities built in.

@jrandrews jrandrews added enhancement New feature or request triage labels Oct 19, 2019
@drewbanin drewbanin removed the triage label Oct 23, 2019
@drewbanin
Copy link
Contributor

Thanks for the detailed writeup @jrandrews! I'm super into this idea. I think we have an open issue from a while back which gets at this same idea -- i'm going to close that one in favor of this one, as this one is definitely more actionable.

This isn't currently prioritized, but I'd like to add it for a patch release in the 0.15.x line if possible!

@jrandrews
Copy link
Author

Some further comments after office hours today. When originally opening the issue I worded it in such a way so as to imply that each dbt project would only have one set of commands/one workflow that it would use to run. This is not true.

It's likely that a given dbt project would have many different potential workflows associated with it, not just one. One workflow for a project might do a couple of run-operations and then call dbt snapshot. Another might do dbt run -m my_model+ and then dbt test -m my_model+. Another might just run some source tests, etc., etc. So we would need the ability to have different workflows, likely assign names to them, and then issue a command like dbt workflow -w my_workflow where -w tells the workflow command which workflow to run.

@jrandrews
Copy link
Author

Here is an example of an workflow of ours
image (6)
that I would like to encode in the dbt project instead of having to maintain it separately (e.g. via editing dbt cloud job settings):

@marshalljonj
Copy link

Hi, it would be useful to have a way to force the -m flag to be used when using dbt run - we don't want any developer to be able to accidentally or potentially force all of our models to redeploy, as this would
a) cause a disruption to service
b) be computationally expensive especially as we used materialised tables in some our models
So forcing a list of models to be run would help mitigate against this issue.

@beckjake
Copy link
Contributor

There are two things I left out of #2671 as they're workflows-specific, I'll note them here:

  • we should add a workflow_id to that artifact metadata
  • workflows should produce an index.json file which points to all the generated artifacts from its steps

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request more_information_needed stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

5 participants