-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(components): De-hardcoded local output paths. #580
refactor(components): De-hardcoded local output paths. #580
Conversation
/test kubeflow-pipeline-sample-test |
d5a541e
to
f2efab5
Compare
/test kubeflow-pipeline-build-image |
/test kubeflow-pipeline-sample-test |
908ce6e
to
a6adf38
Compare
/test kubeflow-pipeline-sample-test |
/test kubeflow-pipeline-e2e-test |
@@ -60,6 +61,14 @@ def parse_arguments(): | |||
type=int, | |||
default=32, | |||
help='Batch size used in prediction.') | |||
parser.add_argument('--prediction-results-uri-pattern-output-path', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably missed some context here. Why is this needed? Does it add burden to component author to add each output as a command line argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The industry is moving away from Docker towards open and standardized container support and so do our users and customers.
We have received bug reports from users that our pipelines fail on non-Docker Kubernetes clusters.
While making Argo work on a non-Docker cluster is trivial, it's much harder with Pipelines.
One particular problem is with local artifact paths. There are specific system-imposed requirements for where the artifacts can and cannot be stored. More so, this depends on the Argo executor being used in the cluster.
De-hardcoding the local artifact paths seems to be the only option for keeping the pipelines portable.
Usually the programs are already written without any paths hard-coded.
If the program did not follow the best practices, the author only need to add one line to the program to resolve the problem. This seem to be a small price where the alternative is a component that only work on some clusters.
Are you planning to close this issue? since you did not try to resolve hongye's concern, should we close it? |
The industry is moving away from Docker towards open and standardized container support and so do our users and customers. More so, we've received reports that our components code is hard to test. Indeed most of the components miss test cases and testing the components is complicated by the fact that they try to write outputs to unconfigurable global locations which are usually off-limits for the test code. Making the paths output paths configurable makes the component code much easier to test. |
@Ark-kun One quick question: does it mean that moving forward, components should not utilize |
No. The components should still output data by writing it to local paths unless they output 100GB of data. But components should not hardcode those output paths in the code. The local paths should be given by the system. Does this make sense? |
/retest Looks like we need to fix this flakiness |
/retest |
3 similar comments
/retest |
/retest |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Bobgy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Components - De-hardcoded local output paths. * pip install pathlib2 * Added component.yaml changes * The Dataflow components have been deleted
* Components - De-hardcoded local output paths. * pip install pathlib2 * Added component.yaml changes * The Dataflow components have been deleted
* Components - De-hardcoded local output paths. * pip install pathlib2 * Added component.yaml changes * The Dataflow components have been deleted
Preparation for the future storage system.
This change is