Cloudmlmerge (#238) · googledatalab/pydatalab@6fa717b

This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

Commit

Cloudmlmerge (#238)

* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110)

* Add gcs_copy_file() that is missing but is referenced in a couple of places.

* Add DataFlow to pydatalab dependency list.

* Fix travis test errors by reimplementing gcs copy.

* Remove unnecessary shutil import.

* Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102)

* Add datalab user agent to CloudML trainer and predictor requests. (#112)

* Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111)

* Update README.md (#114)

Added docs link.

* Generate reST documentation for magic commands (#113)

Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file.

* Fix an issue that %%chart failed with UDF query. (#116)

* Fix an issue that %%chart failed with UDF query.

The problem is that the query is submitted to BQ without replacing variable values from user namespace.

* Fix chart tests by adding ip.user_ns mock.

* Fix charting test.

* Add missing import "mock".

* Fix chart tests.

* Fix "%%bigquery schema" issue --  the command generates nothing in output. (#119)

* Add some missing dependencies, remove some unused ones (#122)

* Remove scikit-learn and scipy as dependencies
* add more required packages
* Add psutil as dependency
* Update packages versions

* Cleanup (#123)

* Remove unnecessary semicolons

* remove unused imports

* remove unncessary defined variable

* Fix query_metadata tests (#128)

Fix query_metadata tests

* Make the library pip-installable (#125)

This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags:
- Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms.
- Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included.

* Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131)

* Fix an issue that setting project id from datalab does not set gcloud default project. (#136)

* Add future==0.16.0 as a dependency since it's required by CloudML SDK (#143)

As of the latest release of CloudML Python SDK, that package seems to require future==0.16.0, so until it's fixed, we'll take it as a dependency.

* Remove tensorflow and CloudML SDK from setup.py (#144)

* Install TensorFlow 0.12.1.

* Remove TensorFlow and CloudML SDK from setup.py.

* Add comments why we ignore errors when importing mlalpha.

* Fix project_id from `gcloud config` in py3 (#194)

- `Popen.stdout` is a `bytes` in py3, needs `.decode()`

- Before:
```py
>>> %%sql -d standard
... select 3
Your active configuration is: [test]

HTTP request failed: Invalid project ID 'b'foo-bar''. Project IDs must contain 6-63 lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash.
```
```sh
$ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done
Your active configuration is: [test]

foo-bar
Your active configuration is: [test]

b'foo-bar'
```

- After:
```py
>>> %%sql -d standard
... select 3
Your active configuration is: [test]

QueryResultsTable job_1_bZNbAUtk8QzlK7bqWD5fz7S5o
```
```sh
$ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done
Your active configuration is: [test]

foo-bar
Your active configuration is: [test]

foo-bar
```

* Use http Keep-Alive, else BigQuery queries are ~seconds slower than necessary (#195)

- Before (without Keep-Alive): ~3-7s for BigQuery `select 3` with an already cached result
- After (with Keep-Alive): ~1.5-3s
- Query sends these 6 http requests and runtime appears to be dominated by network RTT

* cast string to int (#217)

`table.insert_data(df)` inserts data correctly but raises TypeError: unorderable types: str() > int()

* bigquery.Api: Remove unused _DEFAULT_PAGE_SIZE (#221)

Test plan:
- Unit tests still pass

Loading branch information

qimingj authored Feb 25, 2017

1 parent 4e9bf6e commit 6fa717b

datalab/bigquery/_api.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -31,7 +31,6 @@ class Api(object): @@
  _TABLES_PATH = '/projects/%s/datasets/%s/tables/%s%s'
  _TABLEDATA_PATH = '/projects/%s/datasets/%s/tables/%s%s/data'
- _DEFAULT_PAGE_SIZE = 1024
  _DEFAULT_TIMEOUT = 60000
  def __init__(self, context):
@@ Expand Down @@

docs/gen-magic-rst.ipy

-Original file line number
+Diff line change
@@ -1,6 +1,7 @@
 import subprocess, pkgutil, importlib, sys
 from cStringIO import StringIO
 IGNORED_MAGICS = []
 # import submodules
@@ Expand Down @@

0 comments on commit `6fa717b`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `6fa717b`

Commit

There are no files selected for viewing

0 comments on commit 6fa717b

0 comments on commit `6fa717b`