Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support different BQ project id for Spark workload and destination dataset #2346

Open
margoteli opened this issue Dec 4, 2024 · 0 comments

Comments

@margoteli
Copy link

What feature you would like to be added?

i am using the SparkKubernetesOperator to load data from S3 to BQ. i would like to create the ingestion job in BQ project A, but write the data to a table in BQ project B.

Why is this needed?

this is needed to have control over BQ slot allocation for the spark job via project A instead of sharing resources with other workloads in project B. however, the data must remain in project B.

Describe the solution you would like

when creating the spark job, instead of using the project id from the destination table, add an feature that grabs the project id from
(1) the environment
(2) the service account
(3) a configuration when the operator is called

Describe alternatives you have considered

  • Specify GOOGLE_GLOUD_PROJECT env variable (docs)
  • Redesigning the job to write to the same project (unfortunately it must run separately from the destination table's project)

Additional context

No response

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant