Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update RunInference documentation #22250

Merged
merged 52 commits into from
Jul 15, 2022
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
c7b0d99
starting RunInference documentation drafts
rszper Jun 29, 2022
eb89211
starting RunInference documentation drafts
rszper Jul 11, 2022
1a29d9e
RunInference documentation updates
rszper Jul 11, 2022
2be4739
RunInference documentation updates
rszper Jul 11, 2022
f5a0f0c
Skipping notebook because it doesn't exist
rszper Jul 11, 2022
1d404b6
Added troubleshooting section
rszper Jul 11, 2022
2d954de
Updated examples in snippets
rszper Jul 12, 2022
245dc49
Updated test
rszper Jul 12, 2022
99b1308
Updated troubleshooting section
rszper Jul 12, 2022
b0f430a
Updated troubleshooting section
rszper Jul 12, 2022
b02fd09
Moved batching elements content out of the RunInference transform page
rszper Jul 12, 2022
1234657
Added related links to the ML page
rszper Jul 12, 2022
0973e45
Updated docs based on comments
rszper Jul 12, 2022
5dd713e
Changed tensorflow to TensorFlow
rszper Jul 12, 2022
8a50ee8
Updated content based on Andy and Anand's comments
rszper Jul 13, 2022
e49ebec
Updated content based on Andy and Anand's comments
rszper Jul 13, 2022
5db0b67
Updated examples and made more changes based on comments.
rszper Jul 13, 2022
85d866b
Updated troubleshooting wording
rszper Jul 13, 2022
f766888
Updated model handler import intro
rszper Jul 13, 2022
40351c4
Trying to indicate that model_handler is a variable to be replaced
rszper Jul 13, 2022
e02160f
Fixed typos in model code
rszper Jul 13, 2022
2be34ea
Removed a line
rszper Jul 13, 2022
090c356
Fixing model_handler variables
rszper Jul 13, 2022
c2995c9
Removed unused examples from snippets
rszper Jul 13, 2022
dbe9b05
Update website/www/site/content/en/documentation/transforms/python/el…
rszper Jul 14, 2022
214ba7f
Update website/www/site/content/en/documentation/sdks/python-machine-…
rszper Jul 14, 2022
07e1f3e
Update website/www/site/content/en/documentation/sdks/python-machine-…
rszper Jul 14, 2022
7d8ce8e
Update website/www/site/content/en/documentation/sdks/python-machine-…
rszper Jul 14, 2022
c0c4548
Update website/www/site/content/en/documentation/sdks/python-machine-…
rszper Jul 14, 2022
580fa7f
Update website/www/site/content/en/documentation/sdks/python-machine-…
rszper Jul 14, 2022
651f52d
Update website/www/site/content/en/documentation/sdks/python-machine-…
rszper Jul 14, 2022
d3f80d5
Update website/www/site/content/en/documentation/sdks/python-machine-…
rszper Jul 14, 2022
380fcd3
Update website/www/site/content/en/documentation/sdks/python-machine-…
rszper Jul 14, 2022
033997b
Update website/www/site/content/en/documentation/sdks/python-machine-…
rszper Jul 14, 2022
335f1f1
Update website/www/site/content/en/documentation/sdks/python-machine-…
rszper Jul 14, 2022
489fce7
Updates based on comments
rszper Jul 14, 2022
4b962b4
Add import try and catch for unit tests that use torch
AnandInguva Jul 14, 2022
9e98188
fixup lint
AnandInguva Jul 14, 2022
46f5ebd
fixup: formatting
AnandInguva Jul 14, 2022
1f7ce97
fixup lint
AnandInguva Jul 14, 2022
07a99d7
Added error message to troubleshooting section
rszper Jul 14, 2022
49b0a7f
Add skip when GCP deps are not found
AnandInguva Jul 14, 2022
2d988c2
Added new pages to the TOC
rszper Jul 14, 2022
abde489
move LinearRegression out of the module
AnandInguva Jul 14, 2022
c340531
Add uses_pytorch marker to the torch tests
AnandInguva Jul 14, 2022
a3786c9
change importing order
AnandInguva Jul 15, 2022
cde3380
Modify torch tests
AnandInguva Jul 15, 2022
67b5d42
fixup torch test string
AnandInguva Jul 15, 2022
4160d1b
Merge pull request #18 from AnandInguva/inference-snippets
rszper Jul 15, 2022
4aa492c
Updates based on comments
rszper Jul 15, 2022
c1d5643
Fixed link location
rszper Jul 15, 2022
4e89126
Added a period
rszper Jul 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# coding=utf-8
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# pytype: skip-file

def torch_unkeyed_model_handler(test=None):
# [START torch_unkeyed_model_handler]
import apache_beam as beam
import numpy
import torch
from apache_beam.ml.inference.base import RunInference
from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor

class LinearRegression(torch.nn.Module):
def __init__(self, input_dim=1, output_dim=1):
super().__init__()
self.linear = torch.nn.Linear(input_dim, output_dim)

def forward(self, x):
out = self.linear(x)
return out

model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt' # pylint: disable=line-too-long
model_class = LinearRegression
model_params = {'input_dim': 1, 'output_dim': 1}
model_handler = PytorchModelHandlerTensor(
model_class=model_class,
model_params=model_params,
state_dict_path=model_state_dict_path)

unkeyed_data = numpy.array([10, 40, 60, 90],
dtype=numpy.float32).reshape(-1, 1)

with beam.Pipeline() as p:
predictions = (
p
| 'InputData' >> beam.Create(unkeyed_data)
| 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
| 'PytorchRunInference' >> RunInference(model_handler=model_handler)
| beam.Map(print))
# [END torch_unkeyed_model_handler]
if test:
test(predictions)


def torch_keyed_model_handler(test=None):
# [START torch_keyed_model_handler]
import apache_beam as beam
import torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnandInguva I see ModuleNotFoundError: No module named 'torch'. Do you know where we can install extra deps for snippet examples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which environment do you run the snippet examples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tox.

from apache_beam.ml.inference.base import KeyedModelHandler
from apache_beam.ml.inference.base import RunInference
from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor

class LinearRegression(torch.nn.Module):
def __init__(self, input_dim=1, output_dim=1):
super().__init__()
self.linear = torch.nn.Linear(input_dim, output_dim)

def forward(self, x):
out = self.linear(x)
return out

model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt' # pylint: disable=line-too-long
model_class = LinearRegression
model_params = {'input_dim': 1, 'output_dim': 1}
keyed_model_handler = KeyedModelHandler(
PytorchModelHandlerTensor(
model_class=model_class,
model_params=model_params,
state_dict_path=model_state_dict_path))

keyed_data = [("first_question", 105.00), ("second_question", 108.00),
("third_question", 1000.00), ("fourth_question", 1013.00)]

with beam.Pipeline() as p:
predictions = (
p
| 'KeyedInputData' >> beam.Create(keyed_data)
| "ConvertIntToTensor" >>
beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
| 'PytorchRunInference' >>
RunInference(model_handler=keyed_model_handler)
| beam.Map(print))
# [END torch_keyed_model_handler]
if test:
test(predictions)


def sklearn_unkeyed_model_handler(test=None):
# [START sklearn_unkeyed_model_handler]
import apache_beam as beam
import numpy
from apache_beam.ml.inference.base import RunInference
from apache_beam.ml.inference.sklearn_inference import ModelFileType
from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy

sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl' # pylint: disable=line-too-long
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't find this filesystem in the unit tests. Can you double check this @AnandInguva

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need FileSystems here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Other pytorch examples don't need to import it.

Do we need to install apache-beam[gcp]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unable to get filesystem from specified path, please use the correct path or ensure the required dependency is installed, e.g., pip install apache-beam[gcp].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a check to skip if GCP is not detected?

Copy link
Contributor

@AnandInguva AnandInguva Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SKlearn is a installed as a beam dependency in each tox environment, we need to add if GCP is installed as well in every tox test for the Sklearn tests

Copy link
Contributor

@AnandInguva AnandInguva Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we skip the test if apache_beam[gcp] is not installed? @tvalentyn @yeandy . My test fetch the file from the GCS bucket. I just provide the GCS location to the RunInference transform

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically our tests do sth like:

try: 
  from apache_beam.io.gcp.gcsfilesystem import GCSFileSystem
except ImportError: 
  GCSFileSystem = None # type: ignore
...
@unittest.skipIf(gcsfilesystem is None, 'GCP dependencies are not installed')

does this approach work here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SKlearn is a installed as a beam dependency in each tox environment, we need to add if GCP is installed as well in every tox test for the Sklearn tests

I may be missing some context here but can we install gcp dependenices for tox environment for sklearn tests similar to

extras = test,gcp
?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it out. Thanks

sklearn_model_handler = SklearnModelHandlerNumpy(
model_uri=sklearn_model_filename, model_file_type=ModelFileType.PICKLE)

unkeyed_data = numpy.array([20, 40, 60, 90],
dtype=numpy.float32).reshape(-1, 1)
with beam.Pipeline() as p:
predictions = (
p
| "ReadInputs" >> beam.Create(unkeyed_data)
| "RunInferenceSklearn" >>
RunInference(model_handler=sklearn_model_handler)
| beam.Map(print))
# [END sklearn_unkeyed_model_handler]
if test:
test(predictions)


def sklearn_keyed_model_handler(test=None):
# [START sklearn_keyed_model_handler]
import apache_beam as beam
from apache_beam.ml.inference.base import KeyedModelHandler
from apache_beam.ml.inference.base import RunInference
from apache_beam.ml.inference.sklearn_inference import ModelFileType
from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy

sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl' # pylint: disable=line-too-long
sklearn_model_handler = KeyedModelHandler(
SklearnModelHandlerNumpy(
model_uri=sklearn_model_filename,
model_file_type=ModelFileType.PICKLE))

keyed_data = [("first_question", 105.00), ("second_question", 108.00),
("third_question", 1000.00), ("fourth_question", 1013.00)]

with beam.Pipeline() as p:
predictions = (
p
| "ReadInputs" >> beam.Create(keyed_data)
| "ConvertDataToList" >> beam.Map(lambda x: (x[0], [x[1]]))
| "RunInferenceSklearn" >>
RunInference(model_handler=sklearn_model_handler)
| beam.Map(print))
# [END sklearn_keyed_model_handler]
if test:
test(predictions)
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# coding=utf-8
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# pytype: skip-file

import unittest
from io import StringIO

import mock

from apache_beam.examples.snippets.util import assert_matches_stdout
from apache_beam.testing.test_pipeline import TestPipeline

from . import runinference

def check_torch_keyed_model_handler(actual):
expected = '''[START torch_keyed_model_handler]
('first_question', PredictionResult(example=tensor([105.]), inference=tensor([523.6982], grad_fn=<UnbindBackward0>)))
('second_question', PredictionResult(example=tensor([108.]), inference=tensor([538.5867], grad_fn=<UnbindBackward0>)))
('third_question', PredictionResult(example=tensor([1000.]), inference=tensor([4965.4019], grad_fn=<UnbindBackward0>)))
('fourth_question', PredictionResult(example=tensor([1013.]), inference=tensor([5029.9180], grad_fn=<UnbindBackward0>)))
[END torch_keyed_model_handler]'''.splitlines()[1:-1]
assert_matches_stdout(actual, expected)


def check_sklearn_keyed_model_handler(actual):
expected = '''[START sklearn_keyed_model_handler]
('first_question', PredictionResult(example=[105.0], inference=array([525.])))
('second_question', PredictionResult(example=[108.0], inference=array([540.])))
('third_question', PredictionResult(example=[1000.0], inference=array([5000.])))
('fourth_question', PredictionResult(example=[1013.0], inference=array([5065.])))
[END sklearn_keyed_model_handler] '''.splitlines()[1:-1]
assert_matches_stdout(actual, expected)


def check_torch_unkeyed_model_handler(actual):
expected = '''[START torch_unkeyed_model_handler]
PredictionResult(example=tensor([10.]), inference=tensor([52.2325], grad_fn=<UnbindBackward0>))
PredictionResult(example=tensor([40.]), inference=tensor([201.1165], grad_fn=<UnbindBackward0>))
PredictionResult(example=tensor([60.]), inference=tensor([300.3724], grad_fn=<UnbindBackward0>))
PredictionResult(example=tensor([90.]), inference=tensor([449.2563], grad_fn=<UnbindBackward0>))
[END torch_unkeyed_model_handler] '''.splitlines()[1:-1]
assert_matches_stdout(actual, expected)


def check_sklearn_unkeyed_model_handler(actual):
expected = '''[START sklearn_unkeyed_model_handler]
PredictionResult(example=array([20.], dtype=float32), inference=array([100.], dtype=float32))
PredictionResult(example=array([40.], dtype=float32), inference=array([200.], dtype=float32))
PredictionResult(example=array([60.], dtype=float32), inference=array([300.], dtype=float32))
PredictionResult(example=array([90.], dtype=float32), inference=array([450.], dtype=float32))
[END sklearn_unkeyed_model_handler] '''.splitlines()[1:-1]
assert_matches_stdout(actual, expected)

@mock.patch('apache_beam.Pipeline', TestPipeline)
@mock.patch(
'apache_beam.examples.snippets.transforms.elementwise.runinference.print', str)
class RunInferenceTest(unittest.TestCase):
def test_torch_unkeyed_model_handler(self):
runinference.torch_unkeyed_model_handler(check_torch_unkeyed_model_handler)

def test_torch_keyed_model_handler(self):
runinference.torch_keyed_model_handler(check_torch_keyed_model_handler)

def test_sklearn_unkeyed_model_handler(self):
runinference.sklearn_unkeyed_model_handler(check_sklearn_unkeyed_model_handler)

def test_sklearn_keyed_model_handler(self):
runinference.sklearn_keyed_model_handler(check_sklearn_keyed_model_handler)

def test_images(self):
runinference.images(check_images)

def test_digits(self):
runinference.digits(check_digits)

rszper marked this conversation as resolved.
Show resolved Hide resolved
if __name__ == '__main__':
unittest.main()
Loading