Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Runtimes and Executors to serve NVT Workflows #320

Closed

Conversation

karlhigley
Copy link
Contributor

@karlhigley karlhigley commented Apr 7, 2023

This reworks the existing NVTabular Workflow serving to use some of the newer concepts we've introduced, while continuing to support the optimized C++ implementations of NVT ops for serving:

  • It creates a sub-class of LocalExecutor that contains the custom code for running Workflows at serving time (which involves some additional conversion between data formats)
  • It creates a sub-class of Runtime that contains the appropriate operator implementations substitutions
  • It creates a small OpTable abstraction in order to encapsulate logic about when ops should be substituted

Depends on NVIDIA-Merlin/core#279

The workflow output format should now be the same for both TF and Torch, so we can condense their respective sub-classes into the base `WorkflowRunner`, which only leaves the HugeCTR sub-class to deal with later (when we have HugeCTR support.)
Move `_convert_to_np` to the HugeCTR `WorkflowRunner`
@karlhigley karlhigley self-assigned this Apr 7, 2023
@karlhigley karlhigley added the chore Maintenance for the repository label Apr 7, 2023
@karlhigley karlhigley added this to the Merlin 23.04 milestone Apr 7, 2023
@github-actions
Copy link

github-actions bot commented Apr 7, 2023

Documentation preview

https://nvidia-merlin.github.io/systems/review/pr-320

@karlhigley karlhigley force-pushed the refactor/runtimes-executors branch from 6ac2f00 to 9789068 Compare April 10, 2023 19:46
@karlhigley karlhigley marked this pull request as ready for review April 10, 2023 22:42
@@ -44,12 +68,17 @@ def transform(self, graph: Graph, transformable: Transformable):
Graph of nodes container operator chains for data manipulation.
transformable : Transformable
Input data to transform in graph.
convert: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the scenario where we'd want to pass convert=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split between convert and transform allows you to do the conversion ahead of time manually and then pass convert=False in order to avoid doing it every time the transform runs as a small optimization. We put this in so that it was possible to maintain the previous behavior where the conversion only happens once, while also offering the convenience of doing it automatically in the cases where you're not worried about the perf implications.


return formatted_tensors


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that pretty much everything in this file from here down is just getting moved from other places. There's definitely more refactoring we want to do on the code below, but moving it here more clearly communicates that the usage of this code is isolated to this single file and can be freely refactored without causing issues elsewhere.

raise TypeError(f"Unknown type: {type(data)}")


def _convert_format(tensors, target_format):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A next step along the road of this refactor is to replace DictArray with TensorTable and get rid of the manual conversions below. I think we'll still need a function like _convert_format, but the actual conversions it uses should be the dataframe/TensorTable conversions from Merlin Core.

tensors = self._standardize_formats(workflow_node, upstream_outputs)

transform_input = _concat_tensors(tensors)
# TODO: In order to replace the line above with the line below, we first have to replace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one difference from the LocalExecutor._execute_node. The other seems to be the second line calling _merge_addl_root_columns instead of _append_addl_root_columns. Is that something that we might be able to unify with LocalExecutor eventually too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we're aiming to address those things in the next PR in this series of changes. We split the PRs here because fixing the TODO above ended up requiring some deeper changes that run through Core and several of the other libraries.

LOG = logging.getLogger("merlin-systems")


class NVTabularServingExecutor(LocalExecutor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that we're working toward unifying the functionality required here into LocalExecutor entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as possible, yes. The ideal end state would be getting rid of this executor entirely 👍🏻

@karlhigley karlhigley modified the milestones: Merlin 23.04, Merlin 23.05 Apr 25, 2023
@karlhigley karlhigley modified the milestones: Merlin 23.05, Merlin 23.06 May 4, 2023
@karlhigley karlhigley closed this Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Maintenance for the repository clean up
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants