[ADAP-433] [Feature] Granular execution project #647

VasiliiSurov · 2023-04-05T14:31:05Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

With Google introducing new BigQuery editions we can make dbt run more cost efficient if we could manage execution project on a model like, somewhat how it's implemented in Snowflake adapter.

Some of the models are CPU-bound and will cost less if they are executed using on-demand price model, other could by IO-bound and will cost less if executed on Standard/Enterprise Edition.

This feature should work as a new model configuration that can be set on a project / folder / model level.

Describe alternatives you've considered

No dbt-native support, only artificially split dbt DAG into slices and use external orchestration with multiple targets

Who will this benefit?

All users of dbt-bigquery adapter who can invest time into advanced cost management

Are you interested in contributing this feature?

Yes, if provided some guidence.

Anything else?

No response

dbeatty10 · 2023-04-06T15:19:05Z

Thanks for making this feature suggestion @VasiliiSurov !

Are you thinking this would be similar to the snowflake_warehouse model configuration?

I read briefly about BigQuery editions, but I couldn't determine how to specify the "execution project" differently from the "dataset project" in BigQuery. Do you know how? e.g., can you explain what the API call, Python connection parameters, or SQL session parameters would look like?

VasiliiSurov · 2023-04-06T16:15:01Z

@dbeatty10
Yes, I would expect this new configuration parameter to be like snowflake_warehouse and in BigQuery implementation it will extend execution_project from Connection

dbt-bigquery/dbt/adapters/bigquery/connections.py

Line 111 in ea08334

execution_project: Optional[str] = None

The way how I saw it we will need more than a single connection, but a connection per execution_project that we have in the dbt DAG that is currently running. We can keep total number of threads across all the projects for consistency or per execution project which can make dbt a bit more performant

dbeatty10 · 2023-04-06T17:31:49Z

Thanks for connecting the dots for me @VasiliiSurov !

In Snowflake, we can do something like this:

use warehouse my_purple_tshirt

Do you know if we can do something similar in BigQuery?

SET @@dataset_project_id = 'MyProject';

If not, then the comments here and here will be especially relevant.

Either way, it sounds like you are asking the similar/same thing as #343 (which itself is similar to databricks/dbt-databricks#59).

VasiliiSurov · 2023-04-06T19:06:08Z

@dbeatty10

Looks like this one is very much a like what I'm asking. As far as I know you can define "compute" project only on a connection level and once defined it can't be overwritten on a job level.

That's why I was proposing to have a set of connections instead of a single one in current implementation.

jtcohen6 · 2023-04-07T13:23:38Z

That's why I was proposing to have a set of connections instead of a single one in current implementation.

I'd like this, too :) It's going to be a much bigger lift, but something that I'd like to see us building over the medium term.

Fleid · 2023-04-17T13:38:19Z

If BQ's doing it, it feels more like a trend than just Snowflake? So should we consider moving the choice of "compute environment" at the model level across the board, rather than the connection level?

Intuitively, if most platforms would allow the switch to be made in session, like Snowflake with 'USE WAREHOUSE', that would be an easy pull. But for something like BQ, if that means spinning up and down heterogeneous connections as we move through the DAG (or keeping them alive and using them as we go I guess), that sounds much more interesting.

@jtcohen6 do you think that's a core thing, or a BQ thing? Personally I would start here first, and generalize if that makes sense later. We have enough top down initiatives in flight anyway.

@VasiliiSurov I'm thinking that the next step is to spike it on our side to map the territory. From there, we may use your help, but that could be a complicated one.

github-actions · 2023-07-26T01:58:58Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions · 2023-08-03T01:52:07Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

github-actions · 2023-08-03T01:52:08Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

VasiliiSurov added enhancement New feature or request triage labels Apr 5, 2023

github-actions bot changed the title ~~[Feature] Granular execution project~~ [ADAP-433] [Feature] Granular execution project Apr 5, 2023

dbeatty10 assigned Fleid and dbeatty10 Apr 6, 2023

dbeatty10 added awaiting_response and removed triage labels Apr 6, 2023

github-actions bot added triage and removed awaiting_response labels Apr 6, 2023

dbeatty10 added awaiting_response and removed triage labels Apr 6, 2023

github-actions bot added triage and removed awaiting_response labels Apr 6, 2023

Fleid mentioned this issue Apr 7, 2023

[ADAP-427] Add support for custom containers while using dataproc serverless #642

Open

3 tasks

Fleid added awaiting_response and removed triage labels Apr 17, 2023

github-actions bot added the Stale label Jul 26, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADAP-433] [Feature] Granular execution project #647

[ADAP-433] [Feature] Granular execution project #647

VasiliiSurov commented Apr 5, 2023

dbeatty10 commented Apr 6, 2023

VasiliiSurov commented Apr 6, 2023

dbeatty10 commented Apr 6, 2023

VasiliiSurov commented Apr 6, 2023

jtcohen6 commented Apr 7, 2023

Fleid commented Apr 17, 2023

github-actions bot commented Jul 26, 2023

github-actions bot commented Aug 3, 2023

github-actions bot commented Aug 3, 2023

[ADAP-433] [Feature] Granular execution project #647

[ADAP-433] [Feature] Granular execution project #647

Comments

VasiliiSurov commented Apr 5, 2023

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

dbeatty10 commented Apr 6, 2023

VasiliiSurov commented Apr 6, 2023

dbeatty10 commented Apr 6, 2023

VasiliiSurov commented Apr 6, 2023

jtcohen6 commented Apr 7, 2023

Fleid commented Apr 17, 2023

github-actions bot commented Jul 26, 2023

github-actions bot commented Aug 3, 2023

github-actions bot commented Aug 3, 2023