Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-433] [Feature] Granular execution project #647

Closed
3 tasks done
VasiliiSurov opened this issue Apr 5, 2023 · 9 comments
Closed
3 tasks done

[ADAP-433] [Feature] Granular execution project #647

VasiliiSurov opened this issue Apr 5, 2023 · 9 comments
Assignees
Labels

Comments

@VasiliiSurov
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

With Google introducing new BigQuery editions we can make dbt run more cost efficient if we could manage execution project on a model like, somewhat how it's implemented in Snowflake adapter.

Some of the models are CPU-bound and will cost less if they are executed using on-demand price model, other could by IO-bound and will cost less if executed on Standard/Enterprise Edition.

This feature should work as a new model configuration that can be set on a project / folder / model level.

Describe alternatives you've considered

No dbt-native support, only artificially split dbt DAG into slices and use external orchestration with multiple targets

Who will this benefit?

All users of dbt-bigquery adapter who can invest time into advanced cost management

Are you interested in contributing this feature?

Yes, if provided some guidence.

Anything else?

No response

@VasiliiSurov VasiliiSurov added enhancement New feature or request triage labels Apr 5, 2023
@github-actions github-actions bot changed the title [Feature] Granular execution project [ADAP-433] [Feature] Granular execution project Apr 5, 2023
@dbeatty10
Copy link
Contributor

Thanks for making this feature suggestion @VasiliiSurov !

Are you thinking this would be similar to the snowflake_warehouse model configuration?

I read briefly about BigQuery editions, but I couldn't determine how to specify the "execution project" differently from the "dataset project" in BigQuery. Do you know how? e.g., can you explain what the API call, Python connection parameters, or SQL session parameters would look like?

@VasiliiSurov
Copy link
Author

@dbeatty10
Yes, I would expect this new configuration parameter to be like snowflake_warehouse and in BigQuery implementation it will extend execution_project from Connection

execution_project: Optional[str] = None

The way how I saw it we will need more than a single connection, but a connection per execution_project that we have in the dbt DAG that is currently running. We can keep total number of threads across all the projects for consistency or per execution project which can make dbt a bit more performant

@dbeatty10
Copy link
Contributor

Thanks for connecting the dots for me @VasiliiSurov !

In Snowflake, we can do something like this:

use warehouse my_purple_tshirt

Do you know if we can do something similar in BigQuery?

SET @@dataset_project_id = 'MyProject';

If not, then the comments here and here will be especially relevant.

Either way, it sounds like you are asking the similar/same thing as #343 (which itself is similar to databricks/dbt-databricks#59).

@VasiliiSurov
Copy link
Author

@dbeatty10

Looks like this one is very much a like what I'm asking. As far as I know you can define "compute" project only on a connection level and once defined it can't be overwritten on a job level.

That's why I was proposing to have a set of connections instead of a single one in current implementation.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 7, 2023

That's why I was proposing to have a set of connections instead of a single one in current implementation.

I'd like this, too :) It's going to be a much bigger lift, but something that I'd like to see us building over the medium term.

@Fleid
Copy link
Contributor

Fleid commented Apr 17, 2023

If BQ's doing it, it feels more like a trend than just Snowflake? So should we consider moving the choice of "compute environment" at the model level across the board, rather than the connection level?

Intuitively, if most platforms would allow the switch to be made in session, like Snowflake with 'USE WAREHOUSE', that would be an easy pull. But for something like BQ, if that means spinning up and down heterogeneous connections as we move through the DAG (or keeping them alive and using them as we go I guess), that sounds much more interesting.

@jtcohen6 do you think that's a core thing, or a BQ thing? Personally I would start here first, and generalize if that makes sense later. We have enough top down initiatives in flight anyway.

@VasiliiSurov I'm thinking that the next step is to spike it on our side to map the territory. From there, we may use your help, but that could be a complicated one.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 26, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants