Skip to content
This repository was archived by the owner on Sep 3, 2022. It is now read-only.

Conversation

@parthea
Copy link
Contributor

@parthea parthea commented Dec 5, 2016

During development I often use the flake8 syntax checker. I manually update the setup.cfg file to ignore the following:

  • Only report warnings if line length exceeds 100
  • Ignore the warning: "Indentation is not a multiple of four"

I'd like to submit this configuration into the pydatalab project.

I use the following command to check the syntax of my branch against master prior to committing:
git diff master | flake8 --diff

Would it be helpful to add this command to Travis CI ?

(If there is a concern that builds will start to fail in the short term, a potential solution could be to display the output of the command rather than cause builds to fail. Although having a build failure may help trigger an update the flake8 configuration or clean up of code after pushing)

@qimingj
Copy link
Contributor

qimingj commented Dec 8, 2016

I tried flake8 and it is very useful!

One question: does it scan all files? It complains about .ts, .sh files when I run it against my pydatalab repo.

@parthea
Copy link
Contributor Author

parthea commented Dec 9, 2016

One question: does it scan all files? It complains about .ts, .sh files when I run it against my pydatalab repo.

Yes, you're right. The command in my first post scans all files. The following command should work better and only scan python files:

git diff master | flake8 --diff --filename '*.py'

We can also exclude specific files like docs/conf.py by updating the setup.cfg configuration file.

@parthea parthea changed the title Flake8 configuration. Set max line length to 100. Ignore E111 Flake8 configuration. Set max line length to 100. Ignore E111, E114 Dec 9, 2016
@parthea
Copy link
Contributor Author

parthea commented Dec 9, 2016

I've included the output from flake8 for the entire project below. We can configure flake8 to exclude specific files by adding the following line to a specific file to be excluded # flake8: noqa . Source . We can also ignore errors on a line by line basis by adding a # noqa comment at the end of a line of code to be ignored.

Do you think I should submit separate PR(s) to update flake8 to ignore certain files like __init__.py and fix up formatting errors, or should I make those changes in this PR?

tony@tonypc:~/pydatalab-parthea$ flake8 datalab --filename '*.py'
datalab/__init__.py:12:1: W391 blank line at end of file
datalab/mlalpha/_dataset.py:58:16: E225 missing whitespace around operator
datalab/mlalpha/_dataset.py:95:13: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_dataset.py:109:93: E226 missing whitespace around arithmetic operator
datalab/mlalpha/_dataset.py:180:40: E226 missing whitespace around arithmetic operator
datalab/mlalpha/_dataset.py:180:54: E226 missing whitespace around arithmetic operator
datalab/mlalpha/_dataset.py:182:40: E226 missing whitespace around arithmetic operator
datalab/mlalpha/_dataset.py:182:54: E226 missing whitespace around arithmetic operator
datalab/mlalpha/_dataset.py:185:7: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_dataset.py:188:17: E261 at least two spaces before inline comment
datalab/mlalpha/_dataset.py:206:9: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_dataset.py:209:11: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_dataset.py:233:46: E231 missing whitespace after ','
datalab/mlalpha/_dataset.py:233:53: E231 missing whitespace after ','
datalab/mlalpha/_dataset.py:233:67: E231 missing whitespace after ','
datalab/mlalpha/_dataset.py:233:69: E231 missing whitespace after ','
datalab/mlalpha/_dataset.py:236:9: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_dataset.py:243:11: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_dataset.py:264:23: E231 missing whitespace after ':'
datalab/mlalpha/_dataset.py:264:30: E226 missing whitespace around arithmetic operator
datalab/mlalpha/_dataset.py:269:52: E231 missing whitespace after ','
datalab/mlalpha/_dataset.py:269:59: E231 missing whitespace after ','
datalab/mlalpha/_dataset.py:269:66: E231 missing whitespace after ','
datalab/mlalpha/_dataset.py:269:80: E231 missing whitespace after ','
datalab/mlalpha/_dataset.py:269:82: E231 missing whitespace after ','
datalab/mlalpha/_dataset.py:269:84: E231 missing whitespace after ','
datalab/mlalpha/_dataset.py:272:9: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_dataset.py:280:11: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_dataset.py:297:30: E703 statement ends with a semicolon
datalab/mlalpha/_dataset.py:351:58: W291 trailing whitespace
datalab/mlalpha/_dataset.py:352:11: E128 continuation line under-indented for visual indent
datalab/mlalpha/_dataset.py:412:101: E501 line too long (112 > 100 characters)
datalab/mlalpha/_cloud_runner.py:53:12: E231 missing whitespace after ','
datalab/mlalpha/_cloud_predictor.py:58:9: E128 continuation line under-indented for visual indent
datalab/mlalpha/_metadata.py:58:31: E261 at least two spaces before inline comment
datalab/mlalpha/_confusion_matrix.py:40:11: E128 continuation line under-indented for visual indent
datalab/mlalpha/_confusion_matrix.py:47:5: E122 continuation line missing indentation or outdented
datalab/mlalpha/_confusion_matrix.py:48:7: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_confusion_matrix.py:49:9: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_confusion_matrix.py:50:11: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_confusion_matrix.py:58:9: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_confusion_matrix.py:60:11: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_confusion_matrix.py:63:11: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_cloud_models.py:55:101: E501 line too long (102 > 100 characters)
datalab/mlalpha/_cloud_models.py:133:9: E128 continuation line under-indented for visual indent
datalab/mlalpha/_cloud_models.py:156:53: E712 comparison to True should be 'if cond is not True:' or 'if not cond:'
datalab/mlalpha/_cloud_models.py:160:25: E999 SyntaxError: invalid syntax
datalab/mlalpha/_cloud_models.py:186:7: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_cloud_models.py:190:20: E128 continuation line under-indented for visual indent
datalab/mlalpha/_tensorboard.py:41:24: E261 at least two spaces before inline comment
datalab/mlalpha/_job.py:95:1: W391 blank line at end of file
datalab/mlalpha/__init__.py:17:1: F401 '._local_runner.LocalRunner' imported but unused
datalab/mlalpha/__init__.py:18:1: F401 '._cloud_runner.CloudRunner' imported but unused
datalab/mlalpha/__init__.py:19:1: F401 '._metadata.Metadata' imported but unused
datalab/mlalpha/__init__.py:20:1: F401 '._local_predictor.LocalPredictor' imported but unused
datalab/mlalpha/__init__.py:21:1: F401 '._cloud_predictor.CloudPredictor' imported but unused
datalab/mlalpha/__init__.py:22:1: F401 '._job.Jobs' imported but unused
datalab/mlalpha/__init__.py:23:1: F401 '._summary.Summary' imported but unused
datalab/mlalpha/__init__.py:24:1: F401 '._tensorboard.TensorBoardManager' imported but unused
datalab/mlalpha/__init__.py:25:1: F401 '._dataset.DataSet' imported but unused
datalab/mlalpha/__init__.py:26:1: F401 '._package.Packager' imported but unused
datalab/mlalpha/__init__.py:27:1: F401 '._cloud_models.CloudModelVersions' imported but unused
datalab/mlalpha/__init__.py:27:1: F401 '._cloud_models.CloudModels' imported but unused
datalab/mlalpha/__init__.py:28:1: F401 '._confusion_matrix.ConfusionMatrix' imported but unused
datalab/mlalpha/__init__.py:33:1: W391 blank line at end of file
datalab/mlalpha/_local_predictor.py:36:101: E501 line too long (105 > 100 characters)
datalab/mlalpha/_local_predictor.py:94:5: E129 visually indented line with same indent as next logical line
datalab/mlalpha/_local_predictor.py:94:13: W503 line break before binary operator
datalab/mlalpha/_local_predictor.py:97:17: E128 continuation line under-indented for visual indent
datalab/mlalpha/_local_runner.py:109:7: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_local_runner.py:116:7: E121 continuation line under-indented for hanging indent
datalab/mlalpha/_local_runner.py:120:10: E231 missing whitespace after ','
datalab/mlalpha/_local_runner.py:151:10: E231 missing whitespace after ','
datalab/mlalpha/_local_runner.py:153:10: E231 missing whitespace after ','
datalab/mlalpha/_local_runner.py:196:42: E225 missing whitespace around operator
datalab/mlalpha/commands/_mlalpha.py:65:29: E127 continuation line over-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:66:29: E127 continuation line over-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:156:47: E231 missing whitespace after ','
datalab/mlalpha/commands/_mlalpha.py:215:5: E121 continuation line under-indented for hanging indent
datalab/mlalpha/commands/_mlalpha.py:218:7: E131 continuation line unaligned for hanging indent
datalab/mlalpha/commands/_mlalpha.py:230:7: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:244:9: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:251:7: E121 continuation line under-indented for hanging indent
datalab/mlalpha/commands/_mlalpha.py:258:43: E703 statement ends with a semicolon
datalab/mlalpha/commands/_mlalpha.py:284:101: E501 line too long (111 > 100 characters)
datalab/mlalpha/commands/_mlalpha.py:292:5: E129 visually indented line with same indent as next logical line
datalab/mlalpha/commands/_mlalpha.py:293:68: E999 SyntaxError: invalid syntax
datalab/mlalpha/commands/_mlalpha.py:355:101: E501 line too long (113 > 100 characters)
datalab/mlalpha/commands/_mlalpha.py:386:5: E121 continuation line under-indented for hanging indent
datalab/mlalpha/commands/_mlalpha.py:388:7: E121 continuation line under-indented for hanging indent
datalab/mlalpha/commands/_mlalpha.py:391:7: E121 continuation line under-indented for hanging indent
datalab/mlalpha/commands/_mlalpha.py:454:63: E712 comparison to True should be 'if cond is True:' or 'if cond:'
datalab/mlalpha/commands/_mlalpha.py:492:7: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:521:44: E712 comparison to True should be 'if cond is True:' or 'if cond:'
datalab/mlalpha/commands/_mlalpha.py:543:26: E231 missing whitespace after ','
datalab/mlalpha/commands/_mlalpha.py:544:33: E231 missing whitespace after ','
datalab/mlalpha/commands/_mlalpha.py:579:36: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:580:36: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:583:36: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:650:1: E122 continuation line missing indentation or outdented
datalab/mlalpha/commands/_mlalpha.py:671:1: E122 continuation line missing indentation or outdented
datalab/mlalpha/commands/_mlalpha.py:685:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:693:8: E261 at least two spaces before inline comment
datalab/mlalpha/commands/_mlalpha.py:698:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:704:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:713:1: E122 continuation line missing indentation or outdented
datalab/mlalpha/commands/_mlalpha.py:725:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:732:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:738:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:745:1: E122 continuation line missing indentation or outdented
datalab/mlalpha/commands/_mlalpha.py:753:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:776:6: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:777:6: E121 continuation line under-indented for hanging indent
datalab/mlalpha/commands/_mlalpha.py:784:7: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:811:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:822:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:836:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:852:1: W293 blank line contains whitespace
datalab/mlalpha/commands/_mlalpha.py:854:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:873:1: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:912:6: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:913:6: E121 continuation line under-indented for hanging indent
datalab/mlalpha/commands/_mlalpha.py:919:7: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_mlalpha.py:941:7: E128 continuation line under-indented for visual indent
datalab/mlalpha/commands/_tensorboard.py:55:14: E126 continuation line over-indented for hanging indent
datalab/mlalpha/commands/_tensorboard.py:71:1: W391 blank line at end of file
datalab/mlalpha/commands/__init__.py:16:1: F401 '._mlalpha' imported but unused
datalab/mlalpha/commands/__init__.py:17:1: F401 '._tensorboard' imported but unused
datalab/context/_project.py:84:30: E127 continuation line over-indented for visual indent
datalab/context/_project.py:85:30: E127 continuation line over-indented for visual indent
datalab/context/_project.py:86:30: E127 continuation line over-indented for visual indent
datalab/context/_utils.py:30:3: E121 continuation line under-indented for hanging indent
datalab/context/_utils.py:73:14: E225 missing whitespace around operator
datalab/context/_api.py:44:1: W391 blank line at end of file
datalab/context/_context.py:61:101: E501 line too long (128 > 100 characters)
datalab/context/__init__.py:15:1: F401 '._context.Context' imported but unused
datalab/context/__init__.py:16:1: F401 '._project.Project' imported but unused
datalab/context/__init__.py:16:1: F401 '._project.Projects' imported but unused
datalab/context/__init__.py:17:1: W391 blank line at end of file
datalab/context/commands/__init__.py:16:1: F401 '._projects' imported but unused
datalab/context/commands/_projects.py:50:24: E127 continuation line over-indented for visual indent
datalab/context/commands/_projects.py:51:101: E501 line too long (105 > 100 characters)
datalab/bigquery/_schema.py:110:7: E121 continuation line under-indented for hanging indent
datalab/bigquery/_schema.py:267:11: E125 continuation line with same indent as next logical line
datalab/bigquery/_utils.py:172:1: W391 blank line at end of file
datalab/bigquery/_federated_table.py:86:7: E121 continuation line under-indented for hanging indent
datalab/bigquery/_federated_table.py:98:1: W391 blank line at end of file
datalab/bigquery/_api.py:310:35: E128 continuation line under-indented for visual indent
datalab/bigquery/_api.py:437:7: E121 continuation line under-indented for hanging indent
datalab/bigquery/_api.py:502:7: E121 continuation line under-indented for hanging indent
datalab/bigquery/_api.py:504:9: E131 continuation line unaligned for hanging indent
datalab/bigquery/_table.py:382:9: E121 continuation line under-indented for hanging indent
datalab/bigquery/_table.py:922:1: W391 blank line at end of file
datalab/bigquery/_query_job.py:112:1: W391 blank line at end of file
datalab/bigquery/_csv_options.py:78:7: E121 continuation line under-indented for hanging indent
datalab/bigquery/_udf.py:83:17: E127 continuation line over-indented for visual indent
datalab/bigquery/_udf.py:88:1: W391 blank line at end of file
datalab/bigquery/__init__.py:16:1: F401 '._csv_options.CSVOptions' imported but unused
datalab/bigquery/__init__.py:17:1: F401 '._dataset.Dataset' imported but unused
datalab/bigquery/__init__.py:17:1: F401 '._dataset.Datasets' imported but unused
datalab/bigquery/__init__.py:18:1: F401 '._dialect.Dialect' imported but unused
datalab/bigquery/__init__.py:19:1: F401 '._federated_table.FederatedTable' imported but unused
datalab/bigquery/__init__.py:21:1: F401 '._query.Query' imported but unused
datalab/bigquery/__init__.py:22:1: F401 '._query_job.QueryJob' imported but unused
datalab/bigquery/__init__.py:23:1: F401 '._query_results_table.QueryResultsTable' imported but unused
datalab/bigquery/__init__.py:24:1: F401 '._query_stats.QueryStats' imported but unused
datalab/bigquery/__init__.py:25:1: F401 '._sampling.Sampling' imported but unused
datalab/bigquery/__init__.py:26:1: F401 '._schema.Schema' imported but unused
datalab/bigquery/__init__.py:27:1: F401 '._table.Table' imported but unused
datalab/bigquery/__init__.py:27:1: F401 '._table.TableMetadata' imported but unused
datalab/bigquery/__init__.py:28:1: F401 '._udf.UDF' imported but unused
datalab/bigquery/__init__.py:29:1: F401 '._utils.DatasetName' imported but unused
datalab/bigquery/__init__.py:29:1: F401 '._utils.TableName' imported but unused
datalab/bigquery/__init__.py:30:1: F401 '._view.View' imported but unused
datalab/bigquery/_query.py:427:101: E501 line too long (106 > 100 characters)
datalab/bigquery/_query.py:543:1: W391 blank line at end of file
datalab/bigquery/commands/__init__.py:16:1: F401 '._bigquery' imported but unused
datalab/bigquery/commands/__init__.py:17:1: W391 blank line at end of file
datalab/bigquery/commands/_bigquery.py:71:7: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:72:7: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:73:7: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:107:7: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:111:30: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:121:7: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:122:7: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:126:30: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:142:7: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:143:7: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:148:30: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:237:32: E127 continuation line over-indented for visual indent
datalab/bigquery/commands/_bigquery.py:313:45: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:314:45: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:318:45: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:319:45: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:367:101: E501 line too long (125 > 100 characters)
datalab/bigquery/commands/_bigquery.py:421:47: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:540:49: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:752:41: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:753:41: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:754:41: E128 continuation line under-indented for visual indent
datalab/bigquery/commands/_bigquery.py:971:101: E501 line too long (102 > 100 characters)
datalab/bigquery/commands/_bigquery.py:985:101: E501 line too long (103 > 100 characters)
datalab/bigquery/commands/_bigquery.py:988:101: E501 line too long (101 > 100 characters)
datalab/kernel/__init__.py:22:3: F401 'IPython.core.magic as _magic' imported but unused
datalab/kernel/__init__.py:24:3: F401 'IPython.get_ipython' imported but unused
datalab/stackdriver/commands/__init__.py:16:1: F401 '._monitoring' imported but unused
datalab/stackdriver/monitoring/__init__.py:17:1: F401 'google.cloud.monitoring.Reducer' imported but unused
datalab/stackdriver/monitoring/__init__.py:17:1: F401 'google.cloud.monitoring.Aligner' imported but unused
datalab/stackdriver/monitoring/__init__.py:18:1: F401 '._group.Groups' imported but unused
datalab/stackdriver/monitoring/__init__.py:19:1: F401 '._metric.MetricDescriptors' imported but unused
datalab/stackdriver/monitoring/__init__.py:20:1: F401 '._query.Query' imported but unused
datalab/stackdriver/monitoring/__init__.py:21:1: F401 '._query_metadata.QueryMetadata' imported but unused
datalab/stackdriver/monitoring/__init__.py:22:1: F401 '._resource.ResourceDescriptors' imported but unused
datalab/utils/_gcp_job.py:45:1: W391 blank line at end of file
datalab/utils/_lru_cache.py:103:1: W391 blank line at end of file
datalab/utils/__init__.py:15:1: F401 '._async.async_function' imported but unused
datalab/utils/__init__.py:15:1: F401 '._async.async_method' imported but unused
datalab/utils/__init__.py:15:1: F401 '._async.async' imported but unused
datalab/utils/__init__.py:16:1: F401 '._gcp_job.GCPJob' imported but unused
datalab/utils/__init__.py:17:1: F401 '._http.RequestException' imported but unused
datalab/utils/__init__.py:17:1: F401 '._http.Http' imported but unused
datalab/utils/__init__.py:18:1: F401 '._iterator.Iterator' imported but unused
datalab/utils/__init__.py:19:1: F401 '._job.JobError' imported but unused
datalab/utils/__init__.py:19:1: F401 '._job.Job' imported but unused
datalab/utils/__init__.py:20:1: F401 '._json_encoder.JSONEncoder' imported but unused
datalab/utils/__init__.py:21:1: F401 '._lru_cache.LRUCache' imported but unused
datalab/utils/__init__.py:22:1: F401 '._lambda_job.LambdaJob' imported but unused
datalab/utils/__init__.py:23:1: F401 '._utils.compare_datetimes' imported but unused
datalab/utils/__init__.py:23:1: F401 '._utils.get_item' imported but unused
datalab/utils/__init__.py:23:1: F401 '._utils.print_exception_with_last_stack' imported but unused
datalab/utils/__init__.py:23:1: F401 '._utils.is_http_running_on' imported but unused
datalab/utils/__init__.py:23:1: F401 '._utils.gcs_copy_file' imported but unused
datalab/utils/__init__.py:23:1: F401 '._utils.pick_unused_port' imported but unused
datalab/utils/__init__.py:24:21: E126 continuation line over-indented for hanging indent
datalab/utils/_http.py:18:1: E402 module level import not at top of file
datalab/utils/_http.py:19:1: E402 module level import not at top of file
datalab/utils/_http.py:20:1: E402 module level import not at top of file
datalab/utils/_http.py:22:1: E402 module level import not at top of file
datalab/utils/_http.py:23:1: E402 module level import not at top of file
datalab/utils/_http.py:24:1: E402 module level import not at top of file
datalab/utils/_http.py:24:22: E401 multiple imports on one line
datalab/utils/_http.py:25:1: E402 module level import not at top of file
datalab/utils/_lambda_job.py:40:3: E265 block comment should start with '# '
datalab/utils/commands/_utils.py:100:5: E121 continuation line under-indented for hanging indent
datalab/utils/commands/_utils.py:348:9: E128 continuation line under-indented for visual indent
datalab/utils/commands/_utils.py:399:9: E128 continuation line under-indented for visual indent
datalab/utils/commands/_chart.py:41:9: E128 continuation line under-indented for visual indent
datalab/utils/commands/_html.py:109:7: E731 do not assign a lambda expression, use a def
datalab/utils/commands/_html.py:115:7: E731 do not assign a lambda expression, use a def
datalab/utils/commands/_html.py:117:7: E731 do not assign a lambda expression, use a def
datalab/utils/commands/__init__.py:17:1: F401 '._commands.CommandParser' imported but unused
datalab/utils/commands/__init__.py:18:1: F401 '._html.Html' imported but unused
datalab/utils/commands/__init__.py:18:1: F401 '._html.HtmlBuilder' imported but unused
datalab/utils/commands/__init__.py:19:1: F403 'from ._utils import *' used; unable to detect undefined names
datalab/utils/commands/__init__.py:19:1: F401 '._utils.*' imported but unused
datalab/utils/commands/__init__.py:22:1: F401 '._chart' imported but unused
datalab/utils/commands/__init__.py:23:1: F401 '._chart_data' imported but unused
datalab/utils/commands/__init__.py:24:1: F401 '._csv' imported but unused
datalab/utils/commands/__init__.py:25:1: F401 '._extension' imported but unused
datalab/utils/commands/__init__.py:26:1: F401 '._job' imported but unused
datalab/utils/commands/__init__.py:27:1: F401 '._modules' imported but unused
datalab/utils/commands/_extension.py:47:1: W391 blank line at end of file
datalab/storage/_item.py:264:7: F841 local variable '_' is assigned to but never used
datalab/storage/_item.py:293:1: W391 blank line at end of file
datalab/storage/_api.py:18:1: E402 module level import not at top of file
datalab/storage/_api.py:20:1: E402 module level import not at top of file
datalab/storage/_api.py:20:22: E401 multiple imports on one line
datalab/storage/_api.py:21:1: E402 module level import not at top of file
datalab/storage/_api.py:22:1: E402 module level import not at top of file
datalab/storage/_bucket.py:229:7: F841 local variable '_' is assigned to but never used
datalab/storage/__init__.py:16:1: F401 '._bucket.Bucket' imported but unused
datalab/storage/__init__.py:16:1: F401 '._bucket.Buckets' imported but unused
datalab/storage/__init__.py:17:1: F401 '._item.Item' imported but unused
datalab/storage/__init__.py:17:1: F401 '._item.Items' imported but unused
datalab/storage/commands/_storage.py:80:7: E128 continuation line under-indented for visual indent
datalab/storage/commands/_storage.py:170:101: E501 line too long (103 > 100 characters)
datalab/storage/commands/_storage.py:219:59: E128 continuation line under-indented for visual indent
datalab/storage/commands/__init__.py:16:1: F401 '._storage' imported but unused
datalab/storage/commands/__init__.py:17:1: W391 blank line at end of file
datalab/notebook/__init__.py:16:3: F401 'IPython as _' imported but unused
datalab/notebook/__init__.py:20:1: E302 expected 2 blank lines, found 1
datalab/notebook/__init__.py:22:1: W391 blank line at end of file
datalab/data/_utils.py:72:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:73:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:74:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:75:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:76:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:77:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:78:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:79:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:80:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:81:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:82:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:83:3: E731 do not assign a lambda expression, use a def
datalab/data/_utils.py:138:1: W391 blank line at end of file
datalab/data/_sql_statement.py:213:1: W391 blank line at end of file
datalab/data/_csv.py:57:3: E304 blank lines found after function decorator
datalab/data/_csv.py:62:3: E304 blank lines found after function decorator
datalab/data/_csv.py:118:70: E712 comparison to True should be 'if cond is True:' or 'if cond:'
datalab/data/_csv.py:152:42: E226 missing whitespace around arithmetic operator
datalab/data/_csv.py:152:46: E226 missing whitespace around arithmetic operator
datalab/data/_csv.py:162:41: E712 comparison to True should be 'if cond is True:' or 'if cond:'
datalab/data/_csv.py:163:62: E712 comparison to True should be 'if cond is True:' or 'if cond:'
datalab/data/_csv.py:164:35: F821 undefined name 'xrange'
datalab/data/_csv.py:165:42: E712 comparison to True should be 'if cond is True:' or 'if cond:'
datalab/data/_sql_module.py:126:1: W391 blank line at end of file
datalab/data/__init__.py:17:1: F401 '._csv.Csv' imported but unused
datalab/data/__init__.py:18:1: F401 '._sql_module.SqlModule' imported but unused
datalab/data/__init__.py:19:1: F401 '._sql_statement.SqlStatement' imported but unused
datalab/data/__init__.py:20:1: F401 '._utils.tokenize' imported but unused
datalab/data/__init__.py:21:1: W391 blank line at end of file
datalab/data/commands/_sql.py:81:30: E127 continuation line over-indented for visual indent
datalab/data/commands/_sql.py:254:11: E125 continuation line with same indent as next logical line
datalab/data/commands/_sql.py:318:101: E501 line too long (121 > 100 characters)
datalab/data/commands/_sql.py:397:11: E121 continuation line under-indented for hanging indent
datalab/data/commands/__init__.py:15:1: F401 '._sql' imported but unused

@parthea
Copy link
Contributor Author

parthea commented Dec 11, 2016

@qimingj Can this be merged? I can create a separate PR for adding this to Travis CI and move the discussion there.

Copy link
Contributor

@qimingj qimingj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@qimingj
Copy link
Contributor

qimingj commented Dec 11, 2016

We should fix all the syntax issues add this to Travis. It would be awesome if you can help on this!

@qimingj qimingj merged commit 7320b39 into googledatalab:master Dec 11, 2016
@parthea parthea deleted the add-flake8-config branch December 11, 2016 23:09
qimingj added a commit that referenced this pull request Jan 19, 2017
* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)
qimingj added a commit that referenced this pull request Feb 1, 2017
* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)
qimingj added a commit that referenced this pull request Feb 1, 2017
* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that setting project id from datalab does not set gcloud default project. (#136)

* Add future==0.16.0 as a dependency since it's required by CloudML SDK (#143)

As of the latest release of CloudML Python SDK, that package seems to require future==0.16.0, so until it's fixed, we'll take it as a dependency.

* Remove tensorflow and CloudML SDK from setup.py (#144)

* Install TensorFlow 0.12.1.

* Remove TensorFlow and CloudML SDK from setup.py.

* Add comments why we ignore errors when importing mlalpha.

* Adding evaluationanalysis API to generate evaluation stats from eval … (#99)

* Adding evaluationanalysis API to generate evaluation stats from eval source CSV file and eval results CSV file.

The resulting stats file will be fed to a visualization component which will come in a separate change.

* Follow up CR comments.

* Feature slicing view visualization component. (#109)

* Datalab Inception (image classification) solution. (#117)

* Datalab Inception (image classification) solution.

* Fix dataflow URL.

* Datalab "ml" magics for running a solution package. Update Inception Package. (#121)

* Datalab Inception (image classification) solution.

* Fix dataflow URL.

* Datalab "ml" magics for running a solution package.
 - Dump function args and docstrings
 - Run functions
Update Inception Package.
 - Added docstring on face functions.
 - Added batch prediction.
 - Use datalab's lib for talking to cloud training and prediction service.
 - More minor fixes and changes.

* Follow up code review comments.

* Fix an PackageRunner issue that temp installation is done multiple times unnecessarily.

* Update feature-slice-view supporting file, which fixes some stability UI issues. (#126)

* Remove old feature-slicing pipeline implementation (is replaced by BigQuery)  Add Confusion matrix magic. (#129)

* Remove old feature-slicing pipeline implementation (is replaced by BigQuery).
Add Confusion matrix magic.

* Follow up on code review comments. Also fix an inception issue that eval loss is nan when eval size is smaller than batch size.

* Fix set union.

* Mergemaster/cloudml (#134)

* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that prediction right after preprocessing fails in inception package local run. (#135)

* add structure data preprocessing and training  (#132)

merging the preprocessing and training parts.

* first full-feature version of structured data is done (#139)

* added the preprocessing/training files.

Preprocessing is connected with datalab. Training is not fully connected
with datalab.

* added training interface.

* local/cloud training ready for review

* saving work

* saving work

* cloud online prediction is done.

* split config file into two (schema/transforms) and updated the
unittests.

* local preprocess/train working

* 1) merged --model_type and --problem_type
2) online/local prediction is done

* added batch prediction

* all prediction is done. Going to make a merge request next

* Update _package.py

removed some white space + add a print statement to  local_predict

* --preprocessing puts a copy of schema in the outut dir.
--no need to pass schema to train in datalab.

* tests can be run from any folder above the test folder by

python -m unittest discover

Also, the training test will parse the output of training and check that
the loss is small.

* Inception Package Improvements (#138)

* Fix an issue that prediction right after preprocessing fails in inception package local run.

* Remove the "labels_file" parameter from inception preprocess/train/predict. Instead it will get labels from training data. Prediction graph will return labels.
Make online prediction works with GCS images.
"%%ml alpha deploy" now also check for "/model" subdir if needed.
Other minor improvements.

* Make local batch prediction really batched.
Batch prediction input may not have to include target column.
Sort labels, so it is consistent between preprocessing and training.
Follow up other core review comments.

* Follow up code review comments.
qimingj added a commit that referenced this pull request Feb 13, 2017
* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)
qimingj added a commit that referenced this pull request Feb 13, 2017
* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that setting project id from datalab does not set gcloud default project. (#136)

* Add future==0.16.0 as a dependency since it's required by CloudML SDK (#143)

As of the latest release of CloudML Python SDK, that package seems to require future==0.16.0, so until it's fixed, we'll take it as a dependency.

* Remove tensorflow and CloudML SDK from setup.py (#144)

* Install TensorFlow 0.12.1.

* Remove TensorFlow and CloudML SDK from setup.py.

* Add comments why we ignore errors when importing mlalpha.

* Adding evaluationanalysis API to generate evaluation stats from eval … (#99)

* Adding evaluationanalysis API to generate evaluation stats from eval source CSV file and eval results CSV file.

The resulting stats file will be fed to a visualization component which will come in a separate change.

* Follow up CR comments.

* Feature slicing view visualization component. (#109)

* Datalab Inception (image classification) solution. (#117)

* Datalab Inception (image classification) solution.

* Fix dataflow URL.

* Datalab "ml" magics for running a solution package. Update Inception Package. (#121)

* Datalab Inception (image classification) solution.

* Fix dataflow URL.

* Datalab "ml" magics for running a solution package.
 - Dump function args and docstrings
 - Run functions
Update Inception Package.
 - Added docstring on face functions.
 - Added batch prediction.
 - Use datalab's lib for talking to cloud training and prediction service.
 - More minor fixes and changes.

* Follow up code review comments.

* Fix an PackageRunner issue that temp installation is done multiple times unnecessarily.

* Update feature-slice-view supporting file, which fixes some stability UI issues. (#126)

* Remove old feature-slicing pipeline implementation (is replaced by BigQuery)  Add Confusion matrix magic. (#129)

* Remove old feature-slicing pipeline implementation (is replaced by BigQuery).
Add Confusion matrix magic.

* Follow up on code review comments. Also fix an inception issue that eval loss is nan when eval size is smaller than batch size.

* Fix set union.

* Mergemaster/cloudml (#134)

* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that prediction right after preprocessing fails in inception package local run. (#135)

* add structure data preprocessing and training  (#132)

merging the preprocessing and training parts.

* first full-feature version of structured data is done (#139)

* added the preprocessing/training files.

Preprocessing is connected with datalab. Training is not fully connected
with datalab.

* added training interface.

* local/cloud training ready for review

* saving work

* saving work

* cloud online prediction is done.

* split config file into two (schema/transforms) and updated the
unittests.

* local preprocess/train working

* 1) merged --model_type and --problem_type
2) online/local prediction is done

* added batch prediction

* all prediction is done. Going to make a merge request next

* Update _package.py

removed some white space + add a print statement to  local_predict

* --preprocessing puts a copy of schema in the outut dir.
--no need to pass schema to train in datalab.

* tests can be run from any folder above the test folder by

python -m unittest discover

Also, the training test will parse the output of training and check that
the loss is small.

* Inception Package Improvements (#138)

* Fix an issue that prediction right after preprocessing fails in inception package local run.

* Remove the "labels_file" parameter from inception preprocess/train/predict. Instead it will get labels from training data. Prediction graph will return labels.
Make online prediction works with GCS images.
"%%ml alpha deploy" now also check for "/model" subdir if needed.
Other minor improvements.

* Make local batch prediction really batched.
Batch prediction input may not have to include target column.
Sort labels, so it is consistent between preprocessing and training.
Follow up other core review comments.

* Follow up code review comments.

* Remove old DataSet implementation. Create new DataSets. (#151)

* Remove old DataSet implementation.

The new Dataset will be used as data source for packages. All DataSets will be capable of sampling to DataFrame, so feature exploration can be done with other libraries.

* Raise error when sample is larger than number of rows.

* Inception package improvements (#155)

* Inception package improvements.

- It takes DataSets as input instead of CSV files. It also supports BigQuery source now.
- Changes to make latest DataFlow and TensorFlow happy.
- Changes in preprocessing to remove partial support for multiple labels.
- Other minor improments.

* Add a comment.

* Update feature slice view UI. Added Slices Overview. (#161)

* Move TensorBoard and TensorFlow Events UI rendering to Python function to deprecate magic. (#163)

* Update feature slice view UI. Added Slices Overview.

* Move TensorBoard and TensorFlow Events UI rendering to Python function to deprecate magic.

Use matplotlib for tf events plotting so it can display well in static HTML pages (such as github).

Improve TensorFlow Events list/get APIs.

* Follow up on CR comments.

* new preprocessing and training for structured data (#160)

* new preprocessing is done

next: work on training, and then update the tests

* saving work

* sw

* seems to be working, going to do tests next

* got preprocessing test working

* training test pass!!!

* added exported graph back in

* dl preprocessing for local, cloud/csv, cloud/bigquery DONE :)

* gcloud cloud training works

* cloud dl training working

* ops, this files should not be saved

* removed junk function

* sw

* review comments

* removed cloudml sdk usage + lint

* review comments

* Move job, models, and feature_slice_view plotting to API. (#167)

* Move job, models, and feature_slice_view plotting to API.

* Follow up on CR comments.

* A util function to repackage and copy the package to staging location. (#169)

* A util function to repackage and copy the package to staging location, so in packages we can use the staging URL as package URL in cloud training.

* Follow up CR comments.

* Follow up CR comments.

* Move confusion matrix from %%ml to library. (#159)

* Move confusion matrix from %%ml to library.

This is part of efforts to move %%ml magic stuff to library to provide a consistent experience (python only).

* Add a comment.

* Improve inception package so there is no need to have an GCS copy of the package. Instead cloud training and preprocessing will repackage it from local installation and upload it to staging. (#175)

* Cloudmlsdp (#177)

* added the ',' graph hack

* sw

* batch prediction done

* sw

* review comments

* Add CloudTrainingConfig namedtuple to wrap cloud training configurations (#178)

* Add CloudTrainingConfig namedtuple to wrap cloud training configurations.

* Follow up code review comments.

* prediction update (#183)

* added the ',' graph hack

* sw

* batch prediction done

* sw

* review comments

* updated the the prediction graph keys, and makde the csvcoder not need
any other file.

* sw

* sw

* added newline

* review comments

* review comments

* trying to fix the Contributor License Agreement error.

* Inception Package Improvements (#186)

* Implement inception cloud batch prediction. Support explicit eval data in preprocessing.

* Follow up on CR comments. Also address changes from latest DataFlow.
qimingj added a commit that referenced this pull request Feb 22, 2017
* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)
qimingj added a commit that referenced this pull request Feb 22, 2017
* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that setting project id from datalab does not set gcloud default project. (#136)

* Add future==0.16.0 as a dependency since it's required by CloudML SDK (#143)

As of the latest release of CloudML Python SDK, that package seems to require future==0.16.0, so until it's fixed, we'll take it as a dependency.

* Remove tensorflow and CloudML SDK from setup.py (#144)

* Install TensorFlow 0.12.1.

* Remove TensorFlow and CloudML SDK from setup.py.

* Add comments why we ignore errors when importing mlalpha.

* Fix project_id from `gcloud config` in py3 (#194)

- `Popen.stdout` is a `bytes` in py3, needs `.decode()`

- Before:
```py
>>> %%sql -d standard
... select 3
Your active configuration is: [test]

HTTP request failed: Invalid project ID 'b'foo-bar''. Project IDs must contain 6-63 lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash.
```
```sh
$ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done
Your active configuration is: [test]

foo-bar
Your active configuration is: [test]

b'foo-bar'
```

- After:
```py
>>> %%sql -d standard
... select 3
Your active configuration is: [test]

QueryResultsTable job_1_bZNbAUtk8QzlK7bqWD5fz7S5o
```
```sh
$ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done
Your active configuration is: [test]

foo-bar
Your active configuration is: [test]

foo-bar
```

* Use http Keep-Alive, else BigQuery queries are ~seconds slower than necessary (#195)

- Before (without Keep-Alive): ~3-7s for BigQuery `select 3` with an already cached result
- After (with Keep-Alive): ~1.5-3s
- Query sends these 6 http requests and runtime appears to be dominated by network RTT

* cast string to int (#217)

`table.insert_data(df)` inserts data correctly but raises TypeError: unorderable types: str() > int()
qimingj added a commit that referenced this pull request Feb 25, 2017
* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that setting project id from datalab does not set gcloud default project. (#136)

* Add future==0.16.0 as a dependency since it's required by CloudML SDK (#143)

As of the latest release of CloudML Python SDK, that package seems to require future==0.16.0, so until it's fixed, we'll take it as a dependency.

* Remove tensorflow and CloudML SDK from setup.py (#144)

* Install TensorFlow 0.12.1.

* Remove TensorFlow and CloudML SDK from setup.py.

* Add comments why we ignore errors when importing mlalpha.

* Fix project_id from `gcloud config` in py3 (#194)

- `Popen.stdout` is a `bytes` in py3, needs `.decode()`

- Before:
```py
>>> %%sql -d standard
... select 3
Your active configuration is: [test]

HTTP request failed: Invalid project ID 'b'foo-bar''. Project IDs must contain 6-63 lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash.
```
```sh
$ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done
Your active configuration is: [test]

foo-bar
Your active configuration is: [test]

b'foo-bar'
```

- After:
```py
>>> %%sql -d standard
... select 3
Your active configuration is: [test]

QueryResultsTable job_1_bZNbAUtk8QzlK7bqWD5fz7S5o
```
```sh
$ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done
Your active configuration is: [test]

foo-bar
Your active configuration is: [test]

foo-bar
```

* Use http Keep-Alive, else BigQuery queries are ~seconds slower than necessary (#195)

- Before (without Keep-Alive): ~3-7s for BigQuery `select 3` with an already cached result
- After (with Keep-Alive): ~1.5-3s
- Query sends these 6 http requests and runtime appears to be dominated by network RTT

* cast string to int (#217)

`table.insert_data(df)` inserts data correctly but raises TypeError: unorderable types: str() > int()

* bigquery.Api: Remove unused _DEFAULT_PAGE_SIZE (#221)

Test plan:
- Unit tests still pass
qimingj added a commit that referenced this pull request Feb 27, 2017
* Adding evaluationanalysis API to generate evaluation stats from eval … (#99)

* Adding evaluationanalysis API to generate evaluation stats from eval source CSV file and eval results CSV file.

The resulting stats file will be fed to a visualization component which will come in a separate change.

* Follow up CR comments.

* Feature slicing view visualization component. (#109)

* Datalab Inception (image classification) solution. (#117)

* Datalab Inception (image classification) solution.

* Fix dataflow URL.

* Datalab "ml" magics for running a solution package. Update Inception Package. (#121)

* Datalab Inception (image classification) solution.

* Fix dataflow URL.

* Datalab "ml" magics for running a solution package.
 - Dump function args and docstrings
 - Run functions
Update Inception Package.
 - Added docstring on face functions.
 - Added batch prediction.
 - Use datalab's lib for talking to cloud training and prediction service.
 - More minor fixes and changes.

* Follow up code review comments.

* Fix an PackageRunner issue that temp installation is done multiple times unnecessarily.

* Update feature-slice-view supporting file, which fixes some stability UI issues. (#126)

* Remove old feature-slicing pipeline implementation (is replaced by BigQuery)  Add Confusion matrix magic. (#129)

* Remove old feature-slicing pipeline implementation (is replaced by BigQuery).
Add Confusion matrix magic.

* Follow up on code review comments. Also fix an inception issue that eval loss is nan when eval size is smaller than batch size.

* Fix set union.

* Mergemaster/cloudml (#134)

* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that prediction right after preprocessing fails in inception package local run. (#135)

* add structure data preprocessing and training  (#132)

merging the preprocessing and training parts.

* first full-feature version of structured data is done (#139)

* added the preprocessing/training files.

Preprocessing is connected with datalab. Training is not fully connected
with datalab.

* added training interface.

* local/cloud training ready for review

* saving work

* saving work

* cloud online prediction is done.

* split config file into two (schema/transforms) and updated the
unittests.

* local preprocess/train working

* 1) merged --model_type and --problem_type
2) online/local prediction is done

* added batch prediction

* all prediction is done. Going to make a merge request next

* Update _package.py

removed some white space + add a print statement to  local_predict

* --preprocessing puts a copy of schema in the outut dir.
--no need to pass schema to train in datalab.

* tests can be run from any folder above the test folder by

python -m unittest discover

Also, the training test will parse the output of training and check that
the loss is small.

* Inception Package Improvements (#138)

* Fix an issue that prediction right after preprocessing fails in inception package local run.

* Remove the "labels_file" parameter from inception preprocess/train/predict. Instead it will get labels from training data. Prediction graph will return labels.
Make online prediction works with GCS images.
"%%ml alpha deploy" now also check for "/model" subdir if needed.
Other minor improvements.

* Make local batch prediction really batched.
Batch prediction input may not have to include target column.
Sort labels, so it is consistent between preprocessing and training.
Follow up other core review comments.

* Follow up code review comments.

* Cloudmlm (#152)

* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that setting project id from datalab does not set gcloud default project. (#136)

* Add future==0.16.0 as a dependency since it's required by CloudML SDK (#143)

As of the latest release of CloudML Python SDK, that package seems to require future==0.16.0, so until it's fixed, we'll take it as a dependency.

* Remove tensorflow and CloudML SDK from setup.py (#144)

* Install TensorFlow 0.12.1.

* Remove TensorFlow and CloudML SDK from setup.py.

* Add comments why we ignore errors when importing mlalpha.

* Adding evaluationanalysis API to generate evaluation stats from eval … (#99)

* Adding evaluationanalysis API to generate evaluation stats from eval source CSV file and eval results CSV file.

The resulting stats file will be fed to a visualization component which will come in a separate change.

* Follow up CR comments.

* Feature slicing view visualization component. (#109)

* Datalab Inception (image classification) solution. (#117)

* Datalab Inception (image classification) solution.

* Fix dataflow URL.

* Datalab "ml" magics for running a solution package. Update Inception Package. (#121)

* Datalab Inception (image classification) solution.

* Fix dataflow URL.

* Datalab "ml" magics for running a solution package.
 - Dump function args and docstrings
 - Run functions
Update Inception Package.
 - Added docstring on face functions.
 - Added batch prediction.
 - Use datalab's lib for talking to cloud training and prediction service.
 - More minor fixes and changes.

* Follow up code review comments.

* Fix an PackageRunner issue that temp installation is done multiple times unnecessarily.

* Update feature-slice-view supporting file, which fixes some stability UI issues. (#126)

* Remove old feature-slicing pipeline implementation (is replaced by BigQuery)  Add Confusion matrix magic. (#129)

* Remove old feature-slicing pipeline implementation (is replaced by BigQuery).
Add Confusion matrix magic.

* Follow up on code review comments. Also fix an inception issue that eval loss is nan when eval size is smaller than batch size.

* Fix set union.

* Mergemaster/cloudml (#134)

* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that prediction right after preprocessing fails in inception package local run. (#135)

* add structure data preprocessing and training  (#132)

merging the preprocessing and training parts.

* first full-feature version of structured data is done (#139)

* added the preprocessing/training files.

Preprocessing is connected with datalab. Training is not fully connected
with datalab.

* added training interface.

* local/cloud training ready for review

* saving work

* saving work

* cloud online prediction is done.

* split config file into two (schema/transforms) and updated the
unittests.

* local preprocess/train working

* 1) merged --model_type and --problem_type
2) online/local prediction is done

* added batch prediction

* all prediction is done. Going to make a merge request next

* Update _package.py

removed some white space + add a print statement to  local_predict

* --preprocessing puts a copy of schema in the outut dir.
--no need to pass schema to train in datalab.

* tests can be run from any folder above the test folder by

python -m unittest discover

Also, the training test will parse the output of training and check that
the loss is small.

* Inception Package Improvements (#138)

* Fix an issue that prediction right after preprocessing fails in inception package local run.

* Remove the "labels_file" parameter from inception preprocess/train/predict. Instead it will get labels from training data. Prediction graph will return labels.
Make online prediction works with GCS images.
"%%ml alpha deploy" now also check for "/model" subdir if needed.
Other minor improvements.

* Make local batch prediction really batched.
Batch prediction input may not have to include target column.
Sort labels, so it is consistent between preprocessing and training.
Follow up other core review comments.

* Follow up code review comments.

* Remove old DataSet implementation. Create new DataSets. (#151)

* Remove old DataSet implementation.

The new Dataset will be used as data source for packages. All DataSets will be capable of sampling to DataFrame, so feature exploration can be done with other libraries.

* Raise error when sample is larger than number of rows.

* Inception package improvements (#155)

* Inception package improvements.

- It takes DataSets as input instead of CSV files. It also supports BigQuery source now.
- Changes to make latest DataFlow and TensorFlow happy.
- Changes in preprocessing to remove partial support for multiple labels.
- Other minor improments.

* Add a comment.

* Update feature slice view UI. Added Slices Overview. (#161)

* Move TensorBoard and TensorFlow Events UI rendering to Python function to deprecate magic. (#163)

* Update feature slice view UI. Added Slices Overview.

* Move TensorBoard and TensorFlow Events UI rendering to Python function to deprecate magic.

Use matplotlib for tf events plotting so it can display well in static HTML pages (such as github).

Improve TensorFlow Events list/get APIs.

* Follow up on CR comments.

* new preprocessing and training for structured data (#160)

* new preprocessing is done

next: work on training, and then update the tests

* saving work

* sw

* seems to be working, going to do tests next

* got preprocessing test working

* training test pass!!!

* added exported graph back in

* dl preprocessing for local, cloud/csv, cloud/bigquery DONE :)

* gcloud cloud training works

* cloud dl training working

* ops, this files should not be saved

* removed junk function

* sw

* review comments

* removed cloudml sdk usage + lint

* review comments

* Move job, models, and feature_slice_view plotting to API. (#167)

* Move job, models, and feature_slice_view plotting to API.

* Follow up on CR comments.

* A util function to repackage and copy the package to staging location. (#169)

* A util function to repackage and copy the package to staging location, so in packages we can use the staging URL as package URL in cloud training.

* Follow up CR comments.

* Follow up CR comments.

* Move confusion matrix from %%ml to library. (#159)

* Move confusion matrix from %%ml to library.

This is part of efforts to move %%ml magic stuff to library to provide a consistent experience (python only).

* Add a comment.

* Improve inception package so there is no need to have an GCS copy of the package. Instead cloud training and preprocessing will repackage it from local installation and upload it to staging. (#175)

* Cloudmlsdp (#177)

* added the ',' graph hack

* sw

* batch prediction done

* sw

* review comments

* Add CloudTrainingConfig namedtuple to wrap cloud training configurations (#178)

* Add CloudTrainingConfig namedtuple to wrap cloud training configurations.

* Follow up code review comments.

* prediction update (#183)

* added the ',' graph hack

* sw

* batch prediction done

* sw

* review comments

* updated the the prediction graph keys, and makde the csvcoder not need
any other file.

* sw

* sw

* added newline

* review comments

* review comments

* trying to fix the Contributor License Agreement error.

* Inception Package Improvements (#186)

* Implement inception cloud batch prediction. Support explicit eval data in preprocessing.

* Follow up on CR comments. Also address changes from latest DataFlow.

* Cloudmlmerge (#188)

* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that setting project id from datalab does not set gcloud default project. (#136)

* Add future==0.16.0 as a dependency since it's required by CloudML SDK (#143)

As of the latest release of CloudML Python SDK, that package seems to require future==0.16.0, so until it's fixed, we'll take it as a dependency.

* Remove tensorflow and CloudML SDK from setup.py (#144)

* Install TensorFlow 0.12.1.

* Remove TensorFlow and CloudML SDK from setup.py.

* Add comments why we ignore errors when importing mlalpha.

* Adding evaluationanalysis API to generate evaluation stats from eval … (#99)

* Adding evaluationanalysis API to generate evaluation stats from eval source CSV file and eval results CSV file.

The resulting stats file will be fed to a visualization component which will come in a separate change.

* Follow up CR comments.

* Feature slicing view visualization component. (#109)

* Datalab Inception (image classification) solution. (#117)

* Datalab Inception (image classification) solution.

* Fix dataflow URL.

* Datalab "ml" magics for running a solution package. Update Inception Package. (#121)

* Datalab Inception (image classification) solution.

* Fix dataflow URL.

* Datalab "ml" magics for running a solution package.
 - Dump function args and docstrings
 - Run functions
Update Inception Package.
 - Added docstring on face functions.
 - Added batch prediction.
 - Use datalab's lib for talking to cloud training and prediction service.
 - More minor fixes and changes.

* Follow up code review comments.

* Fix an PackageRunner issue that temp installation is done multiple times unnecessarily.

* Update feature-slice-view supporting file, which fixes some stability UI issues. (#126)

* Remove old feature-slicing pipeline implementation (is replaced by BigQuery)  Add Confusion matrix magic. (#129)

* Remove old feature-slicing pipeline implementation (is replaced by BigQuery).
Add Confusion matrix magic.

* Follow up on code review comments. Also fix an inception issue that eval loss is nan when eval size is smaller than batch size.

* Fix set union.

* Mergemaster/cloudml (#134)

* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that prediction right after preprocessing fails in inception package local run. (#135)

* add structure data preprocessing and training  (#132)

merging the preprocessing and training parts.

* first full-feature version of structured data is done (#139)

* added the preprocessing/training files.

Preprocessing is connected with datalab. Training is not fully connected
with datalab.

* added training interface.

* local/cloud training ready for review

* saving work

* saving work

* cloud online prediction is done.

* split config file into two (schema/transforms) and updated the
unittests.

* local preprocess/train working

* 1) merged --model_type and --problem_type
2) online/local prediction is done

* added batch prediction

* all prediction is done. Going to make a merge request next

* Update _package.py

removed some white space + add a print statement to  local_predict

* --preprocessing puts a copy of schema in the outut dir.
--no need to pass schema to train in datalab.

* tests can be run from any folder above the test folder by

python -m unittest discover

Also, the training test will parse the output of training and check that
the loss is small.

* Inception Package Improvements (#138)

* Fix an issue that prediction right after preprocessing fails in inception package local run.

* Remove the "labels_file" parameter from inception preprocess/train/predict. Instead it will get labels from training data. Prediction graph will return labels.
Make online prediction works with GCS images.
"%%ml alpha deploy" now also check for "/model" subdir if needed.
Other minor improvements.

* Make local batch prediction really batched.
Batch prediction input may not have to include target column.
Sort labels, so it is consistent between preprocessing and training.
Follow up other core review comments.

* Follow up code review comments.

* Remove old DataSet implementation. Create new DataSets. (#151)

* Remove old DataSet implementation.

The new Dataset will be used as data source for packages. All DataSets will be capable of sampling to DataFrame, so feature exploration can be done with other libraries.

* Raise error when sample is larger than number of rows.

* Inception package improvements (#155)

* Inception package improvements.

- It takes DataSets as input instead of CSV files. It also supports BigQuery source now.
- Changes to make latest DataFlow and TensorFlow happy.
- Changes in preprocessing to remove partial support for multiple labels.
- Other minor improments.

* Add a comment.

* Update feature slice view UI. Added Slices Overview. (#161)

* Move TensorBoard and TensorFlow Events UI rendering to Python function to deprecate magic. (#163)

* Update feature slice view UI. Added Slices Overview.

* Move TensorBoard and TensorFlow Events UI rendering to Python function to deprecate magic.

Use matplotlib for tf events plotting so it can display well in static HTML pages (such as github).

Improve TensorFlow Events list/get APIs.

* Follow up on CR comments.

* new preprocessing and training for structured data (#160)

* new preprocessing is done

next: work on training, and then update the tests

* saving work

* sw

* seems to be working, going to do tests next

* got preprocessing test working

* training test pass!!!

* added exported graph back in

* dl preprocessing for local, cloud/csv, cloud/bigquery DONE :)

* gcloud cloud training works

* cloud dl training working

* ops, this files should not be saved

* removed junk function

* sw

* review comments

* removed cloudml sdk usage + lint

* review comments

* Move job, models, and feature_slice_view plotting to API. (#167)

* Move job, models, and feature_slice_view plotting to API.

* Follow up on CR comments.

* A util function to repackage and copy the package to staging location. (#169)

* A util function to repackage and copy the package to staging location, so in packages we can use the staging URL as package URL in cloud training.

* Follow up CR comments.

* Follow up CR comments.

* Move confusion matrix from %%ml to library. (#159)

* Move confusion matrix from %%ml to library.

This is part of efforts to move %%ml magic stuff to library to provide a consistent experience (python only).

* Add a comment.

* Improve inception package so there is no need to have an GCS copy of the package. Instead cloud training and preprocessing will repackage it from local installation and upload it to staging. (#175)

* Cloudmlsdp (#177)

* added the ',' graph hack

* sw

* batch prediction done

* sw

* review comments

* Add CloudTrainingConfig namedtuple to wrap cloud training configurations (#178)

* Add CloudTrainingConfig namedtuple to wrap cloud training configurations.

* Follow up code review comments.

* prediction update (#183)

* added the ',' graph hack

* sw

* batch prediction done

* sw

* review comments

* updated the the prediction graph keys, and makde the csvcoder not need
any other file.

* sw

* sw

* added newline

* review comments

* review comments

* trying to fix the Contributor License Agreement error.

* Inception Package Improvements (#186)

* Implement inception cloud batch prediction. Support explicit eval data in preprocessing.

* Follow up on CR comments. Also address changes from latest DataFlow.

* CsvDataSet no longer globs files in init. (#187)

* CsvDataSet no longer globs files in init.

* removed file_io, that fix will be done later

* removed junk lines

* sample uses .file

* fixed csv dataset def files()

* Update _dataset.py

* Move cloud trainer and predictor from their own classes to Job and Model respectively. (#192)

* Move cloud trainer and predictor from their own classes to Job and Model respectively.

Cloud trainer and predictor will be cleaned up in a seperate change.

* Rename CloudModels to Models, CloudModelVersions to ModelVersions. Move their iterator from self to get_iterator() method.

* Switch to cloudml v1 endpoint.

* Remove one comment.

* Follow up on CR comments. Fix a bug in datalab iterator that count keeps incrementing incorrectly.

* removed the feature type file  (#199)

* sw

* removed feature types file from preprocessing

* training: no longer needs the input types file
prediction: cloud batch works now

* updated the tests

* added amazing comment to local_train
check that target column is the first column

* transforms file is not optional on the DL side.

* comments

* comments

* Make inception to work with tf1.0. (#204)

* Workaround a TF summary issue. Force online prediction to use TF 1.0. (#209)

* sd package. Local everything is working.  (#211)

* sw

* sw

* Remove tf dependency from structured data setup.py. (#212)

* Workaround a TF summary issue. Force online prediction to use TF 1.0.

* Remove tf dependency from structured data setup.py.

* Cloudmld (#213)

* sw

* sw

* cloud uses 0.12.0rc? and local uses whatever is in datalab

* for local testing

* master_setup is copy of ../../setup.py

* Add a resize option for inception package to avoid sending large data to online prediction (#215)

* Add a resize option for inception package to avoid sending large data to online prediction.
Update Lantern browser.

* Follow up on code review comments and fix a bug for inception.

* Cleanup mlalpha APIs that are not needed. (#218)

* Inception package updates. (#219)

- Instead of hard code setup.py path, duplicate it along with all py files, just like structured data package.
- Use Pip installable TensorFlow 1.0 for packages.
- Fix some TF warnings.

* Cloudml Branch Merge From Master (#222)

* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that setting project id from datalab does not set gcloud default project. (#136)

* Add future==0.16.0 as a dependency since it's required by CloudML SDK (#143)

As of the latest release of CloudML Python SDK, that package seems to require future==0.16.0, so until it's fixed, we'll take it as a dependency.

* Remove tensorflow and CloudML SDK from setup.py (#144)

* Install TensorFlow 0.12.1.

* Remove TensorFlow and CloudML SDK from setup.py.

* Add comments why we ignore errors when importing mlalpha.

* Fix project_id from `gcloud config` in py3 (#194)

- `Popen.stdout` is a `bytes` in py3, needs `.decode()`

- Before:
```py
>>> %%sql -d standard
... select 3
Your active configuration is: [test]

HTTP request failed: Invalid project ID 'b'foo-bar''. Project IDs must contain 6-63 lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash.
```
```sh
$ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done
Your active configuration is: [test]

foo-bar
Your active configuration is: [test]

b'foo-bar'
```

- After:
```py
>>> %%sql -d standard
... select 3
Your active configuration is: [test]

QueryResultsTable job_1_bZNbAUtk8QzlK7bqWD5fz7S5o
```
```sh
$ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done
Your active configuration is: [test]

foo-bar
Your active configuration is: [test]

foo-bar
```

* Use http Keep-Alive, else BigQuery queries are ~seconds slower than necessary (#195)

- Before (without Keep-Alive): ~3-7s for BigQuery `select 3` with an already cached result
- After (with Keep-Alive): ~1.5-3s
- Query sends these 6 http requests and runtime appears to be dominated by network RTT

* cast string to int (#217)

`table.insert_data(df)` inserts data correctly but raises TypeError: unorderable types: str() > int()

* Remove CloudML SDK as dependency for PyDatalab. (#227)

* Remove CloudML dependency from Inception. (#225)

* TensorFlow's save_model no longer creates export.meta, so disable the  check in deploying models. (#228)

* TensorFlow's save_model no longer creates export.meta, so disable the check in deploying models.

* Also check for saved_model.pb for deployment.

* Cloudmlsm (#229)

* csv prediction graph done

* csv works, but not json!!!

* sw, train working

* cloud training working

* finished census sample, cleaned up interface

* review comments

* small fixes to sd (#231)

* small fixes

* more small fixes

* Rename from mlalpha to ml. (#232)

* fixed prediction (#235)

* small fixes (#236)

* 1) prediction 'key_from_input' now the true key name
2) DF prediction now make csv_schema.json file
3) removed function that was not used.

* update csv_schema.json in _package too

* Cloudmlmerge (#238)

* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that setting project id from datalab does not set gcloud default project. (#136)

* Add future==0.16.0 as a dependency since it's required by CloudML SDK (#143)

As of the latest release of CloudML Python SDK, that package seems to require future==0.16.0, so until it's fixed, we'll take it as a dependency.

* Remove tensorflow and CloudML SDK from setup.py (#144)

* Install TensorFlow 0.12.1.

* Remove TensorFlow and CloudML SDK from setup.py.

* Add comments why we ignore errors when importing mlalpha.

* Fix project_id from `gcloud config` in py3 (#194)

- `Popen.stdout` is a `bytes` in py3, needs `.decode()`

- Before:
```py
>>> %%sql -d standard
... select 3
Your active configuration is: [test]

HTTP request failed: Invalid project ID 'b'foo-bar''. Project IDs must contain 6-63 lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash.
```
```sh
$ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done
Your active configuration is: [test]

foo-bar
Your active configuration is: [test]

b'foo-bar'
```

- After:
```py
>>> %%sql -d standard
... select 3
Your active configuration is: [test]

QueryResultsTable job_1_bZNbAUtk8QzlK7bqWD5fz7S5o
```
```sh
$ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done
Your active configuration is: [test]

foo-bar
Your active configuration is: [test]

foo-bar
```

* Use http Keep-Alive, else BigQuery queries are ~seconds slower than necessary (#195)

- Before (without Keep-Alive): ~3-7s for BigQuery `select 3` with an already cached result
- After (with Keep-Alive): ~1.5-3s
- Query sends these 6 http requests and runtime appears to be dominated by network RTT

* cast string to int (#217)

`table.insert_data(df)` inserts data correctly but raises TypeError: unorderable types: str() > int()

* bigquery.Api: Remove unused _DEFAULT_PAGE_SIZE (#221)

Test plan:
- Unit tests still pass
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants