Use `Runtimes` and `Executors` to serve NVT `Workflow`s #320

karlhigley · 2023-04-07T22:38:38Z

This reworks the existing NVTabular Workflow serving to use some of the newer concepts we've introduced, while continuing to support the optimized C++ implementations of NVT ops for serving:

It creates a sub-class of LocalExecutor that contains the custom code for running Workflows at serving time (which involves some additional conversion between data formats)
It creates a sub-class of Runtime that contains the appropriate operator implementations substitutions
It creates a small OpTable abstraction in order to encapsulate logic about when ops should be substituted

Depends on NVIDIA-Merlin/core#279

The workflow output format should now be the same for both TF and Torch, so we can condense their respective sub-classes into the base `WorkflowRunner`, which only leaves the HugeCTR sub-class to deal with later (when we have HugeCTR support.) Move `_convert_to_np` to the HugeCTR `WorkflowRunner`

github-actions · 2023-04-07T22:47:09Z

Documentation preview

https://nvidia-merlin.github.io/systems/review/pr-320

merlin/systems/dag/runtimes/base_runtime.py

oliverholworthy · 2023-04-12T16:23:37Z

merlin/systems/dag/runtimes/base_runtime.py

@@ -44,12 +68,17 @@ def transform(self, graph: Graph, transformable: Transformable):
            Graph of nodes container operator chains for data manipulation.
        transformable : Transformable
            Input data to transform in graph.
+        convert: bool


what's the scenario where we'd want to pass convert=False?

The split between convert and transform allows you to do the conversion ahead of time manually and then pass convert=False in order to avoid doing it every time the transform runs as a small optimization. We put this in so that it was possible to maintain the previous behavior where the conversion only happens once, while also offering the convenience of doing it automatically in the cases where you're not worried about the perf implications.

karlhigley · 2023-04-12T16:28:54Z

merlin/systems/dag/runtimes/nvtabular/executor.py

+
+        return formatted_tensors
+
+


Note that pretty much everything in this file from here down is just getting moved from other places. There's definitely more refactoring we want to do on the code below, but moving it here more clearly communicates that the usage of this code is isolated to this single file and can be freely refactored without causing issues elsewhere.

karlhigley · 2023-04-12T16:30:43Z

merlin/systems/dag/runtimes/nvtabular/executor.py

+            raise TypeError(f"Unknown type: {type(data)}")
+
+
+def _convert_format(tensors, target_format):


A next step along the road of this refactor is to replace DictArray with TensorTable and get rid of the manual conversions below. I think we'll still need a function like _convert_format, but the actual conversions it uses should be the dataframe/TensorTable conversions from Merlin Core.

oliverholworthy · 2023-04-12T17:25:03Z

merlin/systems/dag/runtimes/nvtabular/executor.py

+        tensors = self._standardize_formats(workflow_node, upstream_outputs)
+
+        transform_input = _concat_tensors(tensors)
+        # TODO: In order to replace the line above with the line below, we first have to replace


This is one difference from the LocalExecutor._execute_node. The other seems to be the second line calling _merge_addl_root_columns instead of _append_addl_root_columns. Is that something that we might be able to unify with LocalExecutor eventually too?

Yeah, we're aiming to address those things in the next PR in this series of changes. We split the PRs here because fixing the TODO above ended up requiring some deeper changes that run through Core and several of the other libraries.

oliverholworthy · 2023-04-12T17:26:11Z

merlin/systems/dag/runtimes/nvtabular/executor.py

+LOG = logging.getLogger("merlin-systems")
+
+
+class NVTabularServingExecutor(LocalExecutor):


Is the idea that we're working toward unifying the functionality required here into LocalExecutor entirely?

As much as possible, yes. The ideal end state would be getting rid of this executor entirely 👍🏻

merlin/systems/dag/runtimes/nvtabular/runtime.py

…o refactor/runtimes-executors

karlhigley self-assigned this Apr 7, 2023

karlhigley added the chore Maintenance for the repository label Apr 7, 2023

karlhigley added this to the Merlin 23.04 milestone Apr 7, 2023

karlhigley and others added 6 commits April 10, 2023 11:08

Remove HugeCTRWorkflowRunner

3748436

Inline _transform_outputs hook method

b22471c

Merge branch 'main' into refactor/workflow-runners

3ea2846

Remove stray code that got pasted into docstring

05130f1

Use Runtimes and Executors to serve NVT Workflows

1b5ad13

Get rid of WorkflowRunner entirely by collapsing into Triton model

9789068

karlhigley force-pushed the refactor/runtimes-executors branch from 6ac2f00 to 9789068 Compare April 10, 2023 19:46

Create and apply OpTable class for runtimes

47c6751

karlhigley marked this pull request as ready for review April 10, 2023 22:42

karlhigley added 3 commits April 11, 2023 17:12

Rework NVTabularServingExecutor to be structured similarly to base

5027dd6

Rework NVTabularServingExecutor to align with LocalExecutor

bbbbea7

Merge branch 'main' into refactor/runtimes-executors

8fceca3

karlhigley added the clean up label Apr 12, 2023

karlhigley requested a review from oliverholworthy April 12, 2023 16:10

oliverholworthy reviewed Apr 12, 2023

View reviewed changes

merlin/systems/dag/runtimes/base_runtime.py Outdated Show resolved Hide resolved

oliverholworthy reviewed Apr 12, 2023

View reviewed changes

karlhigley commented Apr 12, 2023

View reviewed changes

oliverholworthy reviewed Apr 12, 2023

View reviewed changes

oliverholworthy reviewed Apr 13, 2023

View reviewed changes

merlin/systems/dag/runtimes/nvtabular/runtime.py Show resolved Hide resolved

karlhigley and others added 4 commits April 13, 2023 11:09

Fix check for empty OpTable instances without registered ops

4cdfacf

Update Runtime docstring to refer to Merlin DAGs (not Systems graphs)

5261b4c

Merge branch 'main' into refactor/runtimes-executors

671d7b9

Merge branch 'main' into refactor/runtimes-executors

b88260a

karlhigley added 2 commits April 13, 2023 12:09

Port fix to conversions to check for presence of cudf

89e211f

Merge remote-tracking branch 'origin/refactor/runtimes-executors' int…

549f6af

…o refactor/runtimes-executors

karlhigley modified the milestones: Merlin 23.04, Merlin 23.05 Apr 25, 2023

karlhigley modified the milestones: Merlin 23.05, Merlin 23.06 May 4, 2023

karlhigley closed this Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `Runtimes` and `Executors` to serve NVT `Workflow`s #320

Use `Runtimes` and `Executors` to serve NVT `Workflow`s #320

karlhigley commented Apr 7, 2023 •

edited

Loading

github-actions bot commented Apr 7, 2023

oliverholworthy Apr 12, 2023

karlhigley Apr 12, 2023

karlhigley Apr 12, 2023

karlhigley Apr 12, 2023

oliverholworthy Apr 12, 2023

karlhigley Apr 13, 2023

oliverholworthy Apr 12, 2023

karlhigley Apr 13, 2023

		raise TypeError(f"Unknown type: {type(data)}")


		def _convert_format(tensors, target_format):

		LOG = logging.getLogger("merlin-systems")


		class NVTabularServingExecutor(LocalExecutor):

Use Runtimes and Executors to serve NVT Workflows #320

Use Runtimes and Executors to serve NVT Workflows #320

Conversation

karlhigley commented Apr 7, 2023 • edited Loading

github-actions bot commented Apr 7, 2023

Documentation preview

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Use `Runtimes` and `Executors` to serve NVT `Workflow`s #320

Use `Runtimes` and `Executors` to serve NVT `Workflow`s #320

karlhigley commented Apr 7, 2023 •

edited

Loading