-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK - Components refactoring #2865
SDK - Components refactoring #2865
Conversation
This change is a pure refactoring of the implementation of component task creation. For pipelines compiled using the DSL compiler (the compile() function or the command-line program) nothing should change. The main goal of the refactoring is to change the way the component instantiation can be customized. Previously, the flow was like this: `ComponentSpec` + arguments --> `TaskSpec` --resolving+transform--> `ContainerOp` This PR changes it to more direct path: `ComponentSpec` + arguments --constructor--> `ContainerOp` or `ComponentSpec` + arguments --constructor--> `TaskSpec` or `ComponentSpec` + arguments --constructor--> `SomeCustomTask` The original approach where the flow always passes through `TaskSpec` had some issues since TaskSpec only accepts string arguments (and two other reference classes). This made it harder to handle custom types of arguments like PipelineParam or Channel. Low-level refactoring changes: Resolving of command-line argument placeholders has been extracted into a function usable by different task constructors. Changed `_components._created_task_transformation_handler` to `_components._container_task_constructor`. Previously, the handler was receiving a `TaskSpec` instance. Now it receives `ComponentSpec` + arguments [+ `ComponentReference`]. Moved the `ContainerOp` construction handler setup to the `kfp.dsl.Pipeline` context class as planned. Extracted `TaskSpec` creation to `_components._create_task_spec_from_component_and_arguments`. Refactored `_dsl_bridge.create_container_op_from_task` to `_components._resolve_command_line_and_paths` which returns `_ResolvedCommandLineAndPaths`. Renamed `_dsl_bridge._create_container_op_from_resolved_task` to `_dsl_bridge._create_container_op_from_component_and_arguments`. The signature of `_components._resolve_graph_task` was changed and it now returns `_ResolvedGraphTask` instead of modified `TaskSpec`. Some of the component tests still expect ContainerOp and its attributes. These tests will be changed later.
Thanks for the very informative and detailed PR description. Before jumping into the code review, I wish to ask some questions for my own learning purpose.
Thanks! |
I do not want to add any top-level kfp imports in this file to prevent circular references.
Great questions! Sorry for the long text ahead >_< TL/DR version: Long version: TaskSpec was supposed to play an important role in Pipelines SDK with multiple runners. It's a portable declarative structure that describes a task and can be used in graph components to represent a pipeline task. Here are the original scenario flows:
The component subsystem was kind of stand-alone. The user passes data to components creating tasks, the tasks form a graph component that can be published or converted to Argo/Airflow or submitted for execution on Argo/Docker/Kubernetes/etc. I wanted to add the backend entrypoint that would accept These scenarios are still valid and I hope more of them see the light some day. However at that time I had to integrate this portable component library with the existing KFP SDK. Mind you, there was a great opposition to the very idea of the components and the
I had plans to simplify and rewrite the compiler on top of TaskSpec and ComponentSpec, but there was some FUD and the new compiler was never released.
There was one problem with the Think of some other framework like TFX: So, if
There is also an issue that we might want to perform type check - the type of I thought a lot about this issue and I contemplated different variants (e.g. installing custom system-specific argument converters that check the type and convert Pure environment (or when creating graph components, etc): KFP system: TFX system: So, the |
/retest |
2 similar comments
/retest |
/retest |
@numerology The tests are now green. |
Thanks. Will take a look later today. |
Thanks. I can walk you through the code changes if you want, so that it's easier to review (big chunk of the changes is just moving one big function to another file and extracting one function). |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
/retest |
* SDK - Components refactoring This change is a pure refactoring of the implementation of component task creation. For pipelines compiled using the DSL compiler (the compile() function or the command-line program) nothing should change. The main goal of the refactoring is to change the way the component instantiation can be customized. Previously, the flow was like this: `ComponentSpec` + arguments --> `TaskSpec` --resolving+transform--> `ContainerOp` This PR changes it to more direct path: `ComponentSpec` + arguments --constructor--> `ContainerOp` or `ComponentSpec` + arguments --constructor--> `TaskSpec` or `ComponentSpec` + arguments --constructor--> `SomeCustomTask` The original approach where the flow always passes through `TaskSpec` had some issues since TaskSpec only accepts string arguments (and two other reference classes). This made it harder to handle custom types of arguments like PipelineParam or Channel. Low-level refactoring changes: Resolving of command-line argument placeholders has been extracted into a function usable by different task constructors. Changed `_components._created_task_transformation_handler` to `_components._container_task_constructor`. Previously, the handler was receiving a `TaskSpec` instance. Now it receives `ComponentSpec` + arguments [+ `ComponentReference`]. Moved the `ContainerOp` construction handler setup to the `kfp.dsl.Pipeline` context class as planned. Extracted `TaskSpec` creation to `_components._create_task_spec_from_component_and_arguments`. Refactored `_dsl_bridge.create_container_op_from_task` to `_components._resolve_command_line_and_paths` which returns `_ResolvedCommandLineAndPaths`. Renamed `_dsl_bridge._create_container_op_from_resolved_task` to `_dsl_bridge._create_container_op_from_component_and_arguments`. The signature of `_components._resolve_graph_task` was changed and it now returns `_ResolvedGraphTask` instead of modified `TaskSpec`. Some of the component tests still expect ContainerOp and its attributes. These tests will be changed later. * Adapted the _python_op tests * Fixed linter failure I do not want to add any top-level kfp imports in this file to prevent circular references. * Added docstrings * FIxed the return type forward reference
This change is a pure refactoring of the implementation of component task creation.
For pipelines compiled using the DSL compiler (the compile() function or the command-line program) nothing should change.
The main goal of the refactoring is to change the way the component instantiation can be customized.
Previously, the flow was like this:
ComponentSpec
+ arguments -->TaskSpec
--resolving+transform-->ContainerOp
This PR changes it to more direct path:
ComponentSpec
+ arguments --constructor-->ContainerOp
or
ComponentSpec
+ arguments --constructor-->TaskSpec
or
ComponentSpec
+ arguments --constructor-->SomeCustomTask
The original approach where the flow always passes through
TaskSpec
had some issues since TaskSpec only accepts string arguments (and twoother reference classes). This made it harder to handle custom types of arguments like PipelineParam or Channel.
Low-level refactoring changes:
Resolving of command-line argument placeholders has been extracted into a function usable by different task constructors.
Changed
_components._created_task_transformation_handler
to_components._container_task_constructor
. Previously, the handler was receiving aTaskSpec
instance. Now it receivesComponentSpec
+ arguments [+ComponentReference
].Moved the
ContainerOp
construction handler setup to thekfp.dsl.Pipeline
context class as planned.Extracted
TaskSpec
creation to_components._create_task_spec_from_component_and_arguments
.Refactored
_dsl_bridge.create_container_op_from_task
to_components._resolve_command_line_and_paths
which returns_ResolvedCommandLineAndPaths
.Renamed
_dsl_bridge._create_container_op_from_resolved_task
to_dsl_bridge._create_container_op_from_component_and_arguments
.The signature of
_components._resolve_graph_task
was changed and it now returns_ResolvedGraphTask
instead of modifiedTaskSpec
.Some of the component tests still expect ContainerOp and its attributes.
These tests will be changed later.
This change is