WithItems Support #1868

kevinbache · 2019-08-16T20:18:21Z

This PR adds for loop support to the KFP DSL.

Users instantiate loops like so:

from kfp import dsl

@dsl.pipeline(name='my-pipeline', description='A pipeline with a loop.')
def pipeline(my_pipe_param=10):
    loop_args = [{'a': 1, 'b': 2}, {'a': 10, 'b': 20}]
    with dsl.ParallelFor(loop_args) as item:
        op1 = dsl.ContainerOp(
            name="my-in-cop",
            image="library/bash:4.4.23",
            command=["sh", "-c"],
            arguments=["echo op1 %s %s" % (item.a, my_pipe_param)],
        )

They currently support multiple operations within the loop and nested operations. They don't currently support using the output of another operation as the input to loop.

This change is

kevinbache · 2019-08-16T21:32:15Z

/retest

Ark-kun · 2019-08-16T21:57:14Z

sdk/python/kfp/compiler/compiler.py

@@ -677,6 +800,9 @@ def compile(self, pipeline_func, package_path, type_check=True):
      yaml.Dumper.ignore_aliases = lambda *args : True
      yaml_text = yaml.dump(workflow, default_flow_style=False)

+      if package_path is None:


What are the use cases where the YAML text is need?
The Compiler()._compile method returns the workflow dict which seems more useful.

i was using it for visual debugging while i was developing. i figured why not leave the option for anyone else who ends up working on the compiler. what do you guys think?

sdk/python/kfp/dsl/__init__.py

sdk/python/kfp/dsl/_metadata.py

sdk/python/kfp/dsl/_pipeline_param.py

Ark-kun · 2019-08-16T22:05:53Z

sdk/python/tests/compiler/compiler_tests.py

@@ -591,6 +591,12 @@ def some_pipeline():
      if container:
        self.assertEqual(template['retryStrategy']['limit'], 5)

+  def test_withitem_basic(self):


Is it possible to test the actual behavior of the feature instead of just comparing the YAML text?

It would be great if the tests are not starting to fail when some unrelated part (e.g. pipeline name or metadata) changes.

i agree, it'd be nice to have an e2e test as well, but either way, we'll want to include some unit tests too and this is the pattern the repo uses.

Unit tests are important and we should have them. However we can usually do better than just comparing YAML output of the whole pipeline (although, checking for the loop compilation behavior might be non-trivial).
As and example of unit tests that test the feature behavior, see
test_init_container
test_op_transformers
test_set_display_name

I agree, less brittle tests would be nice, though the full YAML comparison is more thorough and that is most of how we currently test the compiler.

sdk/python/tests/compiler/compiler_withitems_test.py

Ark-kun · 2019-08-16T22:08:09Z

sdk/python/tests/compiler/compiler_withitems_test.py

+
+
+# @dsl.pipeline(name='my-pipeline')
+# def pipeline(my_pipe_param=10):


Is this test failing?

shouldn't be

Should there be commented out code here?

sdk/python/tests/compiler/testdata/withitem_basic.py

sdk/python/tests/compiler/testdata/withitem_nested.yaml

Ark-kun · 2019-08-16T22:34:02Z

sdk/python/kfp/compiler/compiler.py

+    return op_name_to_op
+
+  def _fill_loop_args(self, new_root):
+    """Traverses through graph, plucking up loop_args vars from ops groups and depositing pointers to them on the


Can you elaborate on this a bit more?

you're right, this is a bit vague.

Ark-kun · 2019-08-16T22:35:43Z

Thank you for this great work!
If this PR finished or still WIP?

kevinbache · 2019-08-16T23:47:22Z

/retest

sdk/python/kfp/dsl/_for_loop.py

sdk/python/kfp/dsl/_ops_group.py

sdk/python/kfp/compiler/compiler.py

kevinbache · 2019-08-21T17:58:26Z

/ping

hongye-sun

Could you add a basic example for this feature? It doesn't have to be in the same PR.

hongye-sun · 2019-08-21T18:11:41Z

sdk/python/tests/compiler/compiler_withitems_test.py

+    )
+
+
+# @dsl.pipeline(name='my-pipeline')


hongye-sun · 2019-08-21T20:28:49Z

/lgtm

kevinbache · 2019-08-23T20:35:59Z

/assign @IronPan

…items

Ark-kun · 2019-08-23T21:26:45Z

/lgtm
/approve

Ark-kun · 2019-08-23T21:34:05Z

git checkout origin/master .gitignore

kevinbache · 2019-08-24T00:39:47Z

/assign @neuromage

neuromage · 2019-08-24T02:54:33Z

/lgtm
/approve

k8s-ci-robot · 2019-08-24T02:54:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun, neuromage

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [neuromage]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kevinbache · 2019-08-27T18:52:14Z

closes #1481

gaoning777 · 2019-08-28T16:21:11Z

This is great. BTW, have you tested the withitem support with the recursion. For example, create a ParallelFor loop inside a recursive function? AFAIK, this is a common case where the outer recursion controls when to stop a HP running and the inner ParallelFor will run some parameters in parallel.

kevinbache added 6 commits August 9, 2019 10:54

hacking

67676d5

hacking 2

4bd2826

moved withitems to opsgroup

eb66d8c

basic loop test working

1cf7e4f

fixed nested loop bug, added tests

d43d938

cleanup

9442213

k8s-ci-robot requested review from Ark-kun and hongye-sun August 16, 2019 20:18

k8s-ci-robot added the size/XL label Aug 16, 2019

kevinbache changed the title W WithItems Support Aug 16, 2019

kevinbache added 3 commits August 16, 2019 13:26

gitignore; compiler tests

bf6f1a9

Merge branch 'master' into withitems

136eb7e

cleanup

5a8b2c5

kevinbache added 2 commits August 16, 2019 14:42

tests fixup

8df5652

removed format strings

d8987d8