Add dataproc component yaml files #956

hongye-sun · 2019-03-11T20:22:30Z

This change is

hongye-sun · 2019-03-11T21:04:54Z

/retest

components/gcp/dataproc/create_cluster/component.yaml

hongye-sun · 2019-03-11T23:14:58Z

/retest

components/gcp/dataproc/delete_cluster/component.yaml

animeshsingh

Hi @hongye-sun - i am trying to understand how are these YAML files for components description being used in the overall pipelines system?

hongye-sun · 2019-03-12T03:10:30Z

@animeshsingh We want to use those yaml file to share component across pipelines. Basically, the pipeline author should be able to load a component by yaml file. The descriptions in the yaml are served as documentation for the loaded component. Here is an example on how to use it in a notebook: https://github.com/kubeflow/pipelines/tree/master/components/gcp/bigquery/query.

It's still in early state and format in the yaml are going to be changed in the future. E.g. it will be extended to support DAG and other types of resources.

animeshsingh · 2019-03-12T03:16:10Z

"It's still in early state and format in the yaml are going to be changed in the future. E.g. it will be extended to support DAG and other types of resources." - if we support DAG here, wouldnt it start going in the same territory as Argo yaml?

hongye-sun · 2019-03-12T03:26:09Z

True. We are likely to replace the implementation section in the yaml with argo spec here and will keep the inputs and outputs metadata for describing the documentation and type information. Ideally, the load component api should be able to load any compiled pipeline yaml as a DAG component.

components/gcp/dataproc/create_cluster/component.yaml

components/gcp/dataproc/submit_hadoop_job/component.yaml

gaoning777 · 2019-03-12T22:27:42Z

/lgtm

Ark-kun · 2019-03-12T23:13:25Z

Hi @hongye-sun - i am trying to understand how are these YAML files for components description being used in the overall pipelines system?

The component.yaml files are needed for efficient component sharing. Currently many pipeline authors just copy/paste the code between the pipeline files which is an anti-pattern and is error-prone. It's much easier to just write train_op = kfp.components.load_component_from_url("https://..../component.yaml") to load the component and immediately use it to compose a pipeline.

hongye-sun · 2019-03-12T23:15:27Z

/approve

k8s-ci-robot · 2019-03-12T23:15:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongye-sun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~components/OWNERS~~ [hongye-sun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2019-03-12T23:15:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongye-sun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~components/OWNERS~~ [hongye-sun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Ark-kun · 2019-03-12T23:18:06Z

We are likely to
@hongye-sun AFAIK, we have a policy about not disclosing any future plans that are not a part of our roadmap document. Especially, when the plans are not finalized and do not have any planning CUJs or ETAs. It would be best to edit you comment to remove any potential planning information which is not part of the roadmap. Previously, Pascal was very strict about this.

Thanks.

* Add dataproc component yaml files * Update license to 2019 * Remove unused parameter

* Create PRESENTATIONS.md * hyperlink from main README

* [test] tryout kind on github Signed-off-by: Yihong Wang <yh.wang@ibm.com> * build images build and use the images inside the kind cluster Signed-off-by: Yihong Wang <yh.wang@ibm.com> * remove unnecessary step Signed-off-by: Yihong Wang <yh.wang@ibm.com> * build multiple images in a script Signed-off-by: Yihong Wang <yh.wang@ibm.com> * check if any change for backend files check changes for backend files and trigger the integration testing if any. Signed-off-by: Yihong Wang <yh.wang@ibm.com>

Add dataproc component yaml files

e5b0081

k8s-ci-robot requested review from Ark-kun and gaoning777 March 11, 2019 20:22

k8s-ci-robot added the size/XL label Mar 11, 2019

hongye-sun assigned Ark-kun and gaoning777 Mar 11, 2019

Ark-kun reviewed Mar 11, 2019

View reviewed changes

components/gcp/dataproc/create_cluster/component.yaml Show resolved Hide resolved

Ark-kun reviewed Mar 12, 2019

View reviewed changes

components/gcp/dataproc/delete_cluster/component.yaml Show resolved Hide resolved

animeshsingh reviewed Mar 12, 2019

View reviewed changes

gaoning777 reviewed Mar 12, 2019

View reviewed changes

components/gcp/dataproc/create_cluster/component.yaml Outdated Show resolved Hide resolved

gaoning777 reviewed Mar 12, 2019

View reviewed changes

components/gcp/dataproc/create_cluster/component.yaml Outdated Show resolved Hide resolved

gaoning777 reviewed Mar 12, 2019

View reviewed changes

components/gcp/dataproc/create_cluster/component.yaml Show resolved Hide resolved

Update license to 2019

dd33826

gaoning777 reviewed Mar 12, 2019

View reviewed changes

components/gcp/dataproc/submit_hadoop_job/component.yaml Show resolved Hide resolved

Remove unused parameter

5730882

k8s-ci-robot added the lgtm label Mar 12, 2019

k8s-ci-robot added the approved label Mar 12, 2019

k8s-ci-robot merged commit 5868158 into kubeflow:master Mar 12, 2019

cheyang pushed a commit to alibaba/pipelines that referenced this pull request Mar 28, 2019

Add dataproc component yaml files (kubeflow#956)

ee1d9e9

* Add dataproc component yaml files * Update license to 2019 * Remove unused parameter

Linchin pushed a commit to Linchin/pipelines that referenced this pull request Apr 11, 2023

Add universal training image to private ECR (kubeflow#956)

50c91a1

magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023

Create PRESENTATIONS.md (kubeflow#956)

a9413f9

* Create PRESENTATIONS.md * hyperlink from main README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataproc component yaml files #956

Add dataproc component yaml files #956

hongye-sun commented Mar 11, 2019 •

edited by jlewi

Loading

hongye-sun commented Mar 11, 2019

hongye-sun commented Mar 11, 2019

animeshsingh left a comment

hongye-sun commented Mar 12, 2019

animeshsingh commented Mar 12, 2019

hongye-sun commented Mar 12, 2019

gaoning777 commented Mar 12, 2019

Ark-kun commented Mar 12, 2019

hongye-sun commented Mar 12, 2019

k8s-ci-robot commented Mar 12, 2019

k8s-ci-robot commented Mar 12, 2019

Ark-kun commented Mar 12, 2019

Add dataproc component yaml files #956

Add dataproc component yaml files #956

Conversation

hongye-sun commented Mar 11, 2019 • edited by jlewi Loading

hongye-sun commented Mar 11, 2019

hongye-sun commented Mar 11, 2019

animeshsingh left a comment

Choose a reason for hiding this comment

hongye-sun commented Mar 12, 2019

animeshsingh commented Mar 12, 2019

hongye-sun commented Mar 12, 2019

gaoning777 commented Mar 12, 2019

Ark-kun commented Mar 12, 2019

hongye-sun commented Mar 12, 2019

k8s-ci-robot commented Mar 12, 2019

k8s-ci-robot commented Mar 12, 2019

Ark-kun commented Mar 12, 2019

hongye-sun commented Mar 11, 2019 •

edited by jlewi

Loading