-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance pipeline TFX taxi sample to support on-prem cluster #749
Enhance pipeline TFX taxi sample to support on-prem cluster #749
Conversation
/assign @IronPan |
/assign @gaoning777 @Ark-kun |
1d6deab
to
abf32ac
Compare
abf32ac
to
6b33f16
Compare
/test kubeflow-pipeline-e2e-test |
Thanks for the PR! Overall, the direction to support on-prem is great. However, I'm not in favour of having two versions of the pipeline, one for GCP and one using NFS, as it increases maintenance cost. Ideally, we should have one pipeline, with a user-configurable option of whether to store artifacts on GCP or NFS. Would it be feasible to explore this approach instead? |
echoed @neuromage |
@neuromage good point, let me check with @jinchihe for how we can move forward. |
@neuromage @gaoning777 @gyliu513 In fact, we discussed the solution in the original issue #721, as @swiftdiaries mentioned in #721, it's better to have separate example code. The onprem wouldn't be using gcp secrets so having another tfx_taxi_cab_onprem.py makes sense. And we'd be needing volumes, so the code would look very different. |
OK thanks. Let's conclude the discussion on #801 and hold on this PR till then. |
Hello @jinchihe, I was testing your PR today, trying to run the whole example completely on-prem. Although everything looks good, I'm getting an error in the last step of the pipeline, the
After digging in a bit, looking into the container image, I understand that the container image for the deploy part, which in your code is Can you point me to the image you used to run the sample successfully? Thank you, |
6b33f16
to
e159250
Compare
Thanks for the tests @cvenets. The problem has been fixed in #755, due to the RP is proposed to hold on, the image version is not updated. |
e159250
to
81d5c93
Compare
81d5c93
to
7a47e60
Compare
Hello @jinchihe, thank you for updating the pipeline code and image. Unfortunately, when trying to run the updated pipeline the first step (validation) now breaks with the following log. This didn't happen with the previous pipeline code, all steps finished successfully except for the deployment part. Any thoughts? I see that I'm running the updated image now:
Thank you, |
Seems Now I think it's better to merge the Personally I strongly suggest to merge the PR, since so many people pay attention to the PR and ask questions about running the sample on prem (local storage) by Slack and Email. :-) |
Sorry for the interruptions. There was a release yesterday and releases currently change the SHA hashes in the component paths.
It would be great. |
@Ark-kun Thanks. here's what I'm thinking. Please review and comment. Thanks a lot! Due to there are some differences for gcp and on-prem cluster for
So here's what I'm thinking: add new function in
And compile with
|
Hi @jinchihe,
Thank you for integrating with the work merged at #926 and I'm glad you find I have not used components extensively, what is the easiest way to use a vop = dsl.VolumeOp(...)
validation = dataflow_tf_data_validation_op(...).add_pvolumes({pvc_mount_path: vop.volume})
preprocess = dataflow_tf_transform_op(...).add_pvolumes({pvc_mount_path: validation.pvolume})
# or preprocess = dataflow_tf_transform_op(...).add_pvolumes({pvc_mount_path: vop.volume})
# since in this sample all the dependencies are resolved using parameters
... I'm preparing a WIP PR so you can test. |
What about creating the volume, but then using |
@Ark-kun Thanks, This method |
78c5c65
to
e86d246
Compare
/assign @Ark-kun I just updated the PR as following:
Any more comments please let me know. Thanks a lot! |
e86d246
to
69dac3f
Compare
69dac3f
to
57a7237
Compare
/lgtm Sorry for the delays! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@jinchihe Hi jinchi, I was testing the TFX demo on prem today. The pipeline workflow worked fine, but the web ui has some error, I can't open the artifacts ui. I find some js error: 'Uncaught (in promise) Error: Unsupported storage path: /mnt/a07b8215-8c1a-11e9-a2ff-525400ed33aa/tfx-taxi-cab-classification-pipeline-example-q7rtq-2449399348/data/roc.csv'. |
@cdyangzhenyu I think the Pipeline Web UI does not support get the artifacts from local path, but just from gs:// and s3:// etc... May be support this later. Thanks. |
* Fix kfserving ingressgateway typo * Add troube shooting for IngressNotConfigured error
* Add doc for break feature for loop * Fix typos Co-authored-by: Tommy Li <Tommy.chaoping.li@ibm.com>
fixes: #721
Enhance pipeline TFX taxi sample to support on-prem cluster
This change is