From 91038b3269671671882506698b81087b81f5d555 Mon Sep 17 00:00:00 2001 From: Rebecca Szper Date: Tue, 12 Jul 2022 22:10:27 +0000 Subject: [PATCH] Updated docs based on comments --- .../en/documentation/sdks/python-machine-learning.md | 11 ++++++----- .../www/site/content/en/documentation/sdks/python.md | 2 +- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/website/www/site/content/en/documentation/sdks/python-machine-learning.md b/website/www/site/content/en/documentation/sdks/python-machine-learning.md index 948796644d952..c2a8660db90e9 100644 --- a/website/www/site/content/en/documentation/sdks/python-machine-learning.md +++ b/website/www/site/content/en/documentation/sdks/python-machine-learning.md @@ -22,7 +22,7 @@ You can use Apache Beam with the RunInference API to use machine learning (ML) m ## Why use the RunInference API? -RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API allows you to find the input that determined the prediction without returning to the full input data. +RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object). ### BatchElements PTransform @@ -34,12 +34,12 @@ For more information, see the [`BatchElements` transform documentation](https:// ### Shared helper class -Instead of loading a model for each thread in a worker, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the +Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the [`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20). ### Multi-model pipelines -The RunInference API allows you to build complex multi-model pipelines with minimum effort. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more. +The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more. ### Prediction results @@ -99,7 +99,7 @@ with pipeline as p: with pipeline as p: data = p | 'Read' >> beam.ReadFromSource('a_source') model_a_predictions = data | RunInference(ModelHandlerA) - model_b_predictions = data | RunInference(ModelHandlerB) + model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB) ``` ### Use a key handler @@ -182,4 +182,5 @@ the same size. Depending on the language model and encoding technique, this opti ## Related links * [RunInference transforms](/documentation/transforms/python/elementwise/runinference) -* [RunInference API pipeline examples](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) \ No newline at end of file +* [RunInference API pipeline examples](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) +* [apache_beam.ml.inference package](/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference) \ No newline at end of file diff --git a/website/www/site/content/en/documentation/sdks/python.md b/website/www/site/content/en/documentation/sdks/python.md index 5b4536d9fa5c3..6e62083307a7a 100644 --- a/website/www/site/content/en/documentation/sdks/python.md +++ b/website/www/site/content/en/documentation/sdks/python.md @@ -48,7 +48,7 @@ language-specific implementation guidance. ## Using Beam Python SDK in your ML pipelines -To use the Beam Python SDK with your machine learning pipelines, you can either use the RunInference API or TensorFlow. +To use the Beam Python SDK with your machine learning pipelines, use the RunInference API for PyTorch and Sklearn models. If using Tensorflow model, you can make use of the library from `tfx_bsl`. Further integrations for tensorflow are planned. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation. For more information, see [Machine Learning](/documentation/sdks/python-machine-learning).