Support different BQ project id for Spark workload and destination dataset #2346

margoteli · 2024-12-04T23:10:10Z

What feature you would like to be added?

i am using the SparkKubernetesOperator to load data from S3 to BQ. i would like to create the ingestion job in BQ project A, but write the data to a table in BQ project B.

Why is this needed?

this is needed to have control over BQ slot allocation for the spark job via project A instead of sharing resources with other workloads in project B. however, the data must remain in project B.

Describe the solution you would like

when creating the spark job, instead of using the project id from the destination table, add an feature that grabs the project id from
(1) the environment
(2) the service account
(3) a configuration when the operator is called

Describe alternatives you have considered

Specify GOOGLE_GLOUD_PROJECT env variable (docs)
Redesigning the job to write to the same project (unfortunately it must run separately from the destination table's project)

Additional context

No response

Love this feature?

Give it a 👍 We prioritize the features with most 👍

The text was updated successfully, but these errors were encountered:

margoteli added the kind/feature label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support different BQ project id for Spark workload and destination dataset #2346

Support different BQ project id for Spark workload and destination dataset #2346

margoteli commented Dec 4, 2024

Support different BQ project id for Spark workload and destination dataset #2346

Support different BQ project id for Spark workload and destination dataset #2346

Comments

margoteli commented Dec 4, 2024

What feature you would like to be added?

Why is this needed?

Describe the solution you would like

Describe alternatives you have considered

Additional context

Love this feature?