Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: GCP Component for Arbitrary BQ Op without export job #2640

Closed
dhodun opened this issue Nov 21, 2019 · 9 comments
Closed

FR: GCP Component for Arbitrary BQ Op without export job #2640

dhodun opened this issue Nov 21, 2019 · 9 comments
Assignees
Labels
area/components help wanted The community is welcome to contribute. lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@dhodun
Copy link
Contributor

dhodun commented Nov 21, 2019

FR for a more generic BQ Op component that doesn't try to run an export job every time. Use cases include intermediary steps in BQ. Also, AutoML Tables pipelines, where AutoML tables expects entire tables in BQ and does not support a select statement. It would be nice to provide AutoML tables with the temp table or create a separate table vs. pull data out to GCS.

Current component:
https://github.com/kubeflow/pipelines/blob/master/components/gcp/bigquery/query/README.md

@parthmishra
Copy link
Contributor

I think once #2606 is fixed then you should be able to not specify a GCS path and thus no extract job is run.

@dhodun
Copy link
Contributor Author

dhodun commented Nov 22, 2019

Makes sense, I did try that. The component doc should then be updated to indicate the component isn't solely for BQ query + extract job use case.

@parthmishra
Copy link
Contributor

More generally speaking, once #2616 is released, customizing the query job should be doable and support the use cases you described. That being said, I do wonder if there's a balance between having a component being able to everything versus components that do a limited set of actions.

@dhodun
Copy link
Contributor Author

dhodun commented Nov 25, 2019

In general I think it's more clear / readable to have a bigquery_query_op(...) and bigquery_query_to_gcs_op(...) or biqquery_query_to_export_op(...).

It is likely that BQ to GCS is the most common use case anyway, so there is an argument for leaving as is. There are more options that users might use in the future that could overload a single op, such as export to various formats like JSON.

@Ark-kun
Copy link
Contributor

Ark-kun commented Nov 25, 2019

In general I think it's more clear / readable to have a bigquery_query_op(...) and bigquery_query_to_gcs_op(...) or biqquery_query_to_export_op(...).

+1
I favor making separate components when inputs/outputs are different.

Also biqquery_query_to_table.

@Ark-kun Ark-kun added area/components help wanted The community is welcome to contribute. labels Nov 26, 2019
@NikeNano
Copy link
Member

NikeNano commented Jun 1, 2020

I will look in to this.

/assign

@NikeNano
Copy link
Member

NikeNano commented Jun 3, 2020

@Ark-kun as I understand it we would like to have the following three components:

  • bigquery_query_to_gcs_op -- The query is run and the data is extracted to a separate table which is then extracted to GCS
  • bigquery_query_op -- A query is executed, the results are not written to a new table.
  • biqquery_query_to_table -- The Query is run and the data is extracted to a separate table.

is this correct?

@stale
Copy link

stale bot commented Sep 2, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Sep 2, 2020
@stale
Copy link

stale bot commented Sep 11, 2020

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this as completed Sep 11, 2020
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this issue Oct 22, 2023
lightgbm version defined as 3.3.2, but runtime YAML was not updated and only supports major version 2. https://github.com/kserve/kserve/blob/master/python/lgbserver/setup.py#L41

Signed-off-by: alexagriffith <agriffith50@bloomberg.net>

Signed-off-by: alexagriffith <agriffith50@bloomberg.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/components help wanted The community is welcome to contribute. lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
None yet
Development

No branches or pull requests

4 participants