This repository has been archived by the owner on Sep 3, 2022. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add gcs_copy_file() that is missing but is referenced in a couple of places. (#110) * Add gcs_copy_file() that is missing but is referenced in a couple of places. * Add DataFlow to pydatalab dependency list. * Fix travis test errors by reimplementing gcs copy. * Remove unnecessary shutil import. * Flake8 configuration. Set max line length to 100. Ignore E111, E114 (#102) * Add datalab user agent to CloudML trainer and predictor requests. (#112) * Update oauth2client to 2.2.0 to satisfy cloudml in Cloud Datalab (#111) * Update README.md (#114) Added docs link. * Generate reST documentation for magic commands (#113) Auto generate docs for any added magics by searching through the source files for lines with register_line_cell_magic, capturing the names for those magics, and calling them inside an ipython kernel with the -h argument, then storing that output into a generated datalab.magics.rst file. * Fix an issue that %%chart failed with UDF query. (#116) * Fix an issue that %%chart failed with UDF query. The problem is that the query is submitted to BQ without replacing variable values from user namespace. * Fix chart tests by adding ip.user_ns mock. * Fix charting test. * Add missing import "mock". * Fix chart tests. * Fix "%%bigquery schema" issue -- the command generates nothing in output. (#119) * Add some missing dependencies, remove some unused ones (#122) * Remove scikit-learn and scipy as dependencies * add more required packages * Add psutil as dependency * Update packages versions * Cleanup (#123) * Remove unnecessary semicolons * remove unused imports * remove unncessary defined variable * Fix query_metadata tests (#128) Fix query_metadata tests * Make the library pip-installable (#125) This PR adds tensorflow and cloudml in setup.py to make the lib pip-installable. I had to install them explicitly using pip from inside the setup.py script, even though it's not a clean way to do it, it gets around the two issues we have at the moment with these two packags: - Pypi has Tensorflow version 0.12, while we need 0.11 for the current version of pydatalab. According to the Cloud ML docs, that version exists as a pip package for three supported platforms. - Cloud ML SDK exists as a pip package, but also not on Pypi, and while we could add it as a dependency link, there exists another package on Pypi called cloudml, and pip ends up installing that instead (see #124). I cannot find a way to force pip to install the package from the link I included. * Set command description so it is displayed in --help. argparser's format_help() prints description but not help. (#131) * Fix an issue that setting project id from datalab does not set gcloud default project. (#136) * Add future==0.16.0 as a dependency since it's required by CloudML SDK (#143) As of the latest release of CloudML Python SDK, that package seems to require future==0.16.0, so until it's fixed, we'll take it as a dependency. * Remove tensorflow and CloudML SDK from setup.py (#144) * Install TensorFlow 0.12.1. * Remove TensorFlow and CloudML SDK from setup.py. * Add comments why we ignore errors when importing mlalpha. * Fix project_id from `gcloud config` in py3 (#194) - `Popen.stdout` is a `bytes` in py3, needs `.decode()` - Before: ```py >>> %%sql -d standard ... select 3 Your active configuration is: [test] HTTP request failed: Invalid project ID 'b'foo-bar''. Project IDs must contain 6-63 lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash. ``` ```sh $ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done Your active configuration is: [test] foo-bar Your active configuration is: [test] b'foo-bar' ``` - After: ```py >>> %%sql -d standard ... select 3 Your active configuration is: [test] QueryResultsTable job_1_bZNbAUtk8QzlK7bqWD5fz7S5o ``` ```sh $ for p in python2 python3; do $p -c 'from datalab.context._utils import get_project_id; print(get_project_id())'; done Your active configuration is: [test] foo-bar Your active configuration is: [test] foo-bar ``` * Use http Keep-Alive, else BigQuery queries are ~seconds slower than necessary (#195) - Before (without Keep-Alive): ~3-7s for BigQuery `select 3` with an already cached result - After (with Keep-Alive): ~1.5-3s - Query sends these 6 http requests and runtime appears to be dominated by network RTT * cast string to int (#217) `table.insert_data(df)` inserts data correctly but raises TypeError: unorderable types: str() > int() * bigquery.Api: Remove unused _DEFAULT_PAGE_SIZE (#221) Test plan: - Unit tests still pass
- Loading branch information