-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update RunInference documentation #22250
Conversation
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Show resolved
Hide resolved
@@ -48,6 +48,11 @@ language-specific implementation guidance. | |||
|
|||
## Using Beam Python SDK in your ML pipelines | |||
|
|||
To use the Beam Python SDK with your machine learning pipelines, you can either use the RunInference API or TensorFlow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change: use the RunInference API or TensorFlow
to: use the RunInference API for PyTorch and Sklearn models. If using Tensorflow model you can make use of the library from tfx_bsl. Further integrations for tensorflow are planned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, but I'm not sure if we'll need to update this again based on the email thread?
767a4c2
to
74c7ac4
Compare
Added the RunInference examples -> #22254 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work so far! Left some feedback on changes.
Let me know when to take another pass, especially with the example snippets updated.
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/transforms/python/elementwise/runinference.md
Outdated
Show resolved
Hide resolved
817b42e
to
e49ebec
Compare
@yeandy I believe all updates are made based on your comments, except I haven't updated the examples yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! A few more :)
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
|
||
To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data: | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These little chunks of code below here seem out of place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yeandy How do you want to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to reword it to something like this, and keep the (refactored) code block?
To import models, you need to wrap them around a ModelHandler
object. The ModelHandler
you import will depend on the framework and type of data structure that contains the inputs. See the following examples on which ones you may want to import.
from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some updates. Take a look and let me know if we need more changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. By the way, the imports I originally wrote had some typos, so I fixed them
from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to fix the typos
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rszper A last minute addition on the batching issue.
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Show resolved
Hide resolved
Because of max depth recursion error while pickling
Inference snippets
Codecov Report
@@ Coverage Diff @@
## master #22250 +/- ##
==========================================
+ Coverage 74.25% 83.54% +9.28%
==========================================
Files 702 474 -228
Lines 92999 65934 -27065
==========================================
- Hits 69058 55085 -13973
+ Misses 22674 10849 -11825
+ Partials 1267 0 -1267
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
R: @tvalentyn @pabloem we did the review on the docs and the snippets. Would you be able to do a final review and merge the PR? |
@tvalentyn @pabloem This one too please #22069. The tests are queued, but it should be fine since it's only the .md file that was modified. |
|
||
### Shared helper class | ||
|
||
Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
Using the Shared
class within RunInference implementation allows us to load the model only once per process and share it with all DoFn instances created in that process. This reduces the memory consumption and model loading time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
``` | ||
Where `model_handler` is the model handler setup code. | ||
|
||
To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To import models, you need to wrap them around a
ModelHandler
object
Consider instead:
To import models, you need to configure a
ModelHandler object that will wrap the underlying model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
||
Disable batching by overriding the `batch_elements_kwargs` function in your ModelHandler and setting the maximum batch size (`max_batch_size`) to one: `max_batch_size=1`. For more information, see | ||
[BatchElements PTransforms](/documentation/sdks/python-machine-learning/#batchelements-ptransform). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we also link apache_beam/examples/inference/pytorch_language_modeling.py as an example that does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@@ -171,7 +171,7 @@ In some cases, the `PredictionResults` output might not include the correct pred | |||
|
|||
The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project. | |||
|
|||
To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49). | |||
To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49) and our [Bert language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are the same links, looks like not the change we intended to make?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my last comment referred to disable batching section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. This should be fixed now.
Co-authored-by: Anand Inguva <34158215+AnandInguva@users.noreply.github.com> Co-authored-by: Andy Ye <andyye333@gmail.com> Co-authored-by: Anand Inguva <anandinguva98@gmail.com> Co-authored-by: Anand Inguva <anandinguva@google.com>
Adding documentation for the RunInference API.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.