Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-427] Add support for custom containers while using dataproc serverless #642

Open
3 tasks done
bveber opened this issue Apr 1, 2023 · 5 comments
Open
3 tasks done
Labels
enhancement New feature or request help_wanted Extra attention is needed

Comments

@bveber
Copy link

bveber commented Apr 1, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

I would like to use a custom container for my dataproc serveless jobs.

Describe alternatives you've considered

Dataproc serverless does not offer the ability to install extra packages in their standard runtime. The only other way I could add unsupported dependencies is by using a cluster.

Who will this benefit?

Users creating python models with additional dependencies not supported by the standard dataproc serverless runtime.

Are you interested in contributing this feature?

I have a working fork but haven't run the test suite yet.

Anything else?

No response

@bveber bveber added enhancement New feature or request triage labels Apr 1, 2023
@github-actions github-actions bot changed the title Add support for custom containers while using dataproc serverless [ADAP-427] Add support for custom containers while using dataproc serverless Apr 1, 2023
@Fleid
Copy link
Contributor

Fleid commented Apr 5, 2023

Hey @bveber, excuse my lack of expertise on the topic, but what kind of changes are required on dbt-bigquery to make this happen?

Is that a new parameter in the profile that points to a container image, wired to the call dbt makes to dataproc?

@bveber
Copy link
Author

bveber commented Apr 5, 2023

Hi @Fleid, you are correct. The custom container can be defined in the runtime_config that is passed to the dataproc call. It expects an image that exists in Container Registry or Artifact Registry with a naming format like {hostname}/{project-id}/{image}:{tag}. It's a pretty minor code change and you can see this commit on my fork for reference.

@Fleid
Copy link
Contributor

Fleid commented Apr 7, 2023

Sweet! It looks like you would be able to contribute a PR here?
I'm going to mark this 'help_wanted' in the hope you can ;)

@Fleid Fleid added help_wanted Extra attention is needed and removed triage labels Apr 7, 2023
@Fleid Fleid self-assigned this Apr 7, 2023
@bveber
Copy link
Author

bveber commented Apr 7, 2023

I'm happy to raise a PR for this.

I do have another related feature request for the dataproc serverless integration, specifically the ability to pass in custom Spark properties. Currently the number of executor instances is hard-coded and it would be nice to be able to define this in the model config as well as any other relevant properties.

Should I open a new issue to track that change or should I try to implement it in my upcoming PR?

@Fleid
Copy link
Contributor

Fleid commented Apr 7, 2023

Excellent point. My intuition is that it should be another issue/PR, since it's a stand-alone thing and could slow down custom containers.

I'm thinking slowing down, because there are parallel asks that we may want to align on for custom properties, like #647 or #578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Extra attention is needed
Projects
None yet
2 participants