This repository was archived by the owner on Sep 3, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 78
Datalab "ml" magics for running a solution package. Update Inception Package. #121
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
3f4e1df
Datalab Inception (image classification) solution.
qimingj 73cbad8
Fix dataflow URL.
qimingj 87ea746
Datalab "ml" magics for running a solution package.
qimingj 45b323a
Follow up code review comments.
qimingj 59fef82
Fix an PackageRunner issue that temp installation is done multiple ti…
qimingj File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # Copyright 2017 Google Inc. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except | ||
| # in compliance with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software distributed under the License | ||
| # is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express | ||
| # or implied. See the License for the specific language governing permissions and limitations under | ||
| # the License. | ||
|
|
||
| """Implements running Datalab ML Solution Packages.""" | ||
|
|
||
| import inspect | ||
| import google.cloud.ml as ml | ||
| import os | ||
| import shutil | ||
| import subprocess | ||
| import sys | ||
| import tempfile | ||
|
|
||
|
|
||
| PACKAGE_NAMESPACE = 'datalab_solutions' | ||
|
|
||
| class PackageRunner(object): | ||
| """A Helper class to run Datalab ML solution packages.""" | ||
|
|
||
| def __init__(self, package_uri): | ||
| """ | ||
| Args: | ||
| package_uri: The uri of the package. The file base name needs to be in the form of | ||
| "name-version", such as "inception-0.1". The first part split by "-" will be used | ||
| as the last part of the namespace. In the example above, | ||
| "datalab_solutions.inception" will be the namespace. | ||
| """ | ||
| self._package_uri = package_uri | ||
| self._name = os.path.basename(package_uri).split('-')[0] | ||
| self._install_dir = None | ||
|
|
||
| def _install_to_temp(self): | ||
| install_dir = tempfile.mkdtemp() | ||
| tar_path = self._package_uri | ||
| if tar_path.startswith('gs://'): | ||
| tar_path = os.path.join(install_dir, os.path.basename(tar_path)) | ||
| ml.util._file.copy_file(self._package_uri, tar_path) | ||
| subprocess.check_call(['pip', 'install', tar_path, '--target', install_dir, | ||
| '--upgrade', '--force-reinstall']) | ||
| sys.path.insert(0, install_dir) | ||
| self._install_dir = install_dir | ||
|
|
||
| def __enter__(self): | ||
| self._install_to_temp() | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc_value, traceback): | ||
| self._cleanup_installation() | ||
|
|
||
| def _cleanup_installation(self): | ||
| if self._install_dir is None: | ||
| return | ||
| if sys.path[0] == self._install_dir: | ||
| del sys.path[0] | ||
| shutil.rmtree(self._install_dir) | ||
|
|
||
| def get_func_args_and_docstring(self, func_name): | ||
| """Get function args and docstrings. | ||
| Args: | ||
| func_name: name of the function. | ||
| Returns: | ||
| A tuple of function argspec, function docstring. | ||
| """ | ||
| func = getattr(__import__(PACKAGE_NAMESPACE + '.' + self._name, fromlist=[func_name]), | ||
| func_name) | ||
| return inspect.getargspec(func), func.__doc__ | ||
|
|
||
| def run_func(self, func_name, args): | ||
| """Run a function. | ||
| Args: | ||
| func_name: name of the function. | ||
| args: args supplied to the functions. | ||
| Returns: | ||
| function return values. | ||
| """ | ||
| func = getattr(__import__(PACKAGE_NAMESPACE + '.' + self._name, fromlist=[func_name]), | ||
| func_name) | ||
| return func(**args) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,5 +13,6 @@ | |
|
|
||
| from __future__ import absolute_import | ||
|
|
||
| from . import _ml | ||
| from . import _mlalpha | ||
| from . import _tensorboard | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,160 @@ | ||
| # Copyright 2017 Google Inc. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except | ||
| # in compliance with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software distributed under the License | ||
| # is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express | ||
| # or implied. See the License for the specific language governing permissions and limitations under | ||
| # the License. | ||
|
|
||
| try: | ||
| import IPython | ||
| import IPython.core.magic | ||
| except ImportError: | ||
| raise Exception('This module can only be loaded in ipython.') | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This -> IPython |
||
|
|
||
| import collections | ||
| import os | ||
| import yaml | ||
|
|
||
| import datalab.context | ||
| import datalab.mlalpha | ||
| import datalab.utils.commands | ||
|
|
||
|
|
||
| @IPython.core.magic.register_line_cell_magic | ||
| def ml(line, cell=None): | ||
| """Implements the ml line cell magic. | ||
|
|
||
| Args: | ||
| line: the contents of the ml line. | ||
| cell: the contents of the ml cell. | ||
|
|
||
| Returns: | ||
| The results of executing the cell. | ||
| """ | ||
| parser = datalab.utils.commands.CommandParser(prog="ml", description=""" | ||
| Execute various ml-related operations. Use "%%ml <command> -h" for help on a specific command. | ||
| """) | ||
| preprocess_parser = parser.subcommand('preprocess', 'Run a preprocess job.') | ||
| preprocess_parser.add_argument('--usage', | ||
| help='Show usage from the specified preprocess package.', | ||
| action='store_true', default=False) | ||
| preprocess_parser.add_argument('--cloud', | ||
| help='Whether to run the preprocessing job in the cloud.', | ||
| action='store_true', default=False) | ||
| preprocess_parser.add_argument('--package', | ||
| help='The preprocess package to use. Can be a gs or local path.', | ||
| required=True) | ||
| preprocess_parser.set_defaults(func=_preprocess) | ||
|
|
||
| train_parser = parser.subcommand('train', 'Train an ML model.') | ||
| train_parser.add_argument('--usage', | ||
| help='Show usage from the specified trainer package', | ||
| action='store_true', default=False) | ||
| train_parser.add_argument('--cloud', | ||
| help='Whether to run the training job in the cloud.', | ||
| action='store_true', default=False) | ||
| train_parser.add_argument('--package', | ||
| help='The trainer package to use. Can be a gs or local path.', | ||
| required=True) | ||
| train_parser.set_defaults(func=_train) | ||
|
|
||
| predict_parser = parser.subcommand('predict', 'Predict with an ML model.') | ||
| predict_parser.add_argument('--usage', | ||
| help='Show usage from the specified prediction package', | ||
| action='store_true', default=False) | ||
| predict_parser.add_argument('--cloud', | ||
| help='Whether to run prediction in the cloud.', | ||
| action='store_true', default=False) | ||
| predict_parser.add_argument('--package', | ||
| help='The prediction package to use. Can be a gs or local path.', | ||
| required=True) | ||
| predict_parser.set_defaults(func=_predict) | ||
|
|
||
| batch_predict_parser = parser.subcommand('batch_predict', 'Batch predict with an ML model.') | ||
| batch_predict_parser.add_argument('--usage', | ||
| help='Show usage from the specified prediction package', | ||
| action='store_true', default=False) | ||
| batch_predict_parser.add_argument('--cloud', | ||
| help='Whether to run prediction in the cloud.', | ||
| action='store_true', default=False) | ||
| batch_predict_parser.add_argument('--package', | ||
| help='The prediction package to use. Can be a gs or local path.', | ||
| required=True) | ||
| batch_predict_parser.set_defaults(func=_batch_predict) | ||
|
|
||
| namespace = datalab.utils.commands.notebook_environment() | ||
| return datalab.utils.commands.handle_magic_line(line, cell, parser, namespace=namespace) | ||
|
|
||
|
|
||
| def _command_template(pr, func_name): | ||
| """Return (args_list, docstring). | ||
| args_list is in the form of: | ||
| arg1: | ||
| arg2: | ||
| arg3: (optional) | ||
| """ | ||
| argspec, docstring = pr.get_func_args_and_docstring(func_name) | ||
| num_defaults = len(argspec.defaults) if argspec.defaults is not None else 0 | ||
| # Need to fill in a keyword (here '(NOT_OP)') for non optional args. | ||
| # Later we will replace '(NOT_OP)' with empty string. | ||
| optionals = ['(NOT_OP)'] * (len(argspec.args) - num_defaults) + \ | ||
| ['(optional)'] * num_defaults | ||
| args = dict(zip(argspec.args, optionals)) | ||
| args_dump = yaml.safe_dump(args, default_flow_style=False).replace('(NOT_OP)', '') | ||
| return args_dump, docstring | ||
|
|
||
|
|
||
| def _run_package(args, cell, mode): | ||
| local_func_name = 'local_' + mode | ||
| cloud_func_name = 'cloud_' + mode | ||
| with datalab.mlalpha.PackageRunner(args['package']) as pr: | ||
| if args['usage'] is True: | ||
| #TODO Consider calling _command_template once to save one pip installation | ||
| command_local = """%%ml %s --package %s""" % (mode, args['package']) | ||
| args_local, docstring_local = _command_template(pr, local_func_name) | ||
| command_cloud = """%%ml %s --package %s --cloud""" % (mode, args['package']) | ||
| args_cloud, docstring_cloud = _command_template(pr, cloud_func_name) | ||
| output = """ | ||
| Local Run Command: | ||
|
|
||
| %s | ||
| %s | ||
| [Description]: | ||
| %s | ||
|
|
||
| Cloud Run Command: | ||
|
|
||
| %s | ||
| %s | ||
| [Description]: | ||
| %s | ||
| """ % (command_local, args_local, docstring_local, command_cloud, args_cloud, docstring_cloud) | ||
| return datalab.utils.commands.render_text(output, preformatted=True) | ||
|
|
||
| env = datalab.utils.commands.notebook_environment() | ||
| func_args = datalab.utils.commands.parse_config(cell, env) | ||
| if args['cloud'] is True: | ||
| return pr.run_func(cloud_func_name, func_args) | ||
| else: | ||
| return pr.run_func(local_func_name, func_args) | ||
|
|
||
|
|
||
| def _preprocess(args, cell): | ||
| return _run_package(args, cell, 'preprocess') | ||
|
|
||
|
|
||
| def _train(args, cell): | ||
| return _run_package(args, cell, 'train') | ||
|
|
||
|
|
||
| def _predict(args, cell): | ||
| return _run_package(args, cell, 'predict') | ||
|
|
||
|
|
||
| def _batch_predict(args, cell): | ||
| return _run_package(args, cell, 'batch_predict') | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a todo saying to do something better than installing/deleting the package every time a command is run
Oh, and this is done twice when --usage is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not horribly slow but it may be different if we add dependencies in setup.py.
It is important that each function call needs to be stateless. Otherwise it is really hard to manage the session and control when to clean up. Indeed the --usage can be further improved to call it once. Added todo for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented enter and exit functions so we can use it in a with statement. One call for "--usage" now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea, but I don't think you did it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! Fixed. PTAL.