Skip to content

Commit

Permalink
docs: add samples from tables/automl (#54)
Browse files Browse the repository at this point in the history
* Tables Notebooks [(#2090)](#2090)

* initial commit
* update census
* update notebooks

* remove the reference to a bug [(#2100)](#2100)

as the bug has been fixed in the public client lib

* delete this file. [(#2102)](#2102)

* rename file name [(#2103)](#2103)

* trying to fix images [(#2101)](#2101)

* remove typo in installation [(#2110)](#2110)

* Rename census_income_prediction.ipynb to getting_started_notebook.ipynb [(#2115)](#2115)

renaming the notebooks as Getting Started (will be in sync with the doc). It will be great if the folder could be renamed too

* added back missing file package import [(#2150)](#2150)

* added back missing file import [(#2145)](#2145)

* remove incorrect reference to Iris dataset [(#2203)](#2203)

* conversion to jupyter/colab [(#2340)](#2340)

plus bug fixes

* updated for the Jupyter support [(#2337)](#2337)

* updated readme for support Jupyter [(#2336)](#2336)

to approve with the updated notebook supporting jupyter

* conversion to jupyer/colab [(#2339)](#2339)

plus bug fixes

* conversion of notebook for jupyter/Colab [(#2338)](#2338)

conversion of the notebook to support both Jupyter and Colab + bug fixes

* [BLOCKED] AutoML Tables: Docs samples updated to use new (pending) client [(#2276)](#2276)

* AutoML Tables: Docs samples updated to use new (pending) client

* Linter warnings

* add product recommendation for automl tables notebook [(#2257)](#2257)

* added colab filtering notebook

* update to tables client

* update readme

* tell user to restart kernel for automl

* AutoML Tables: Notebook samples updated to use new tables client [(#2424)](#2424)

* fix users bug and emphasize kernal restart [(#2407)](#2407)

* fix problems with automl docs [(#2501)](#2501)

Today when we try to use the function `batch_predict` follow the docs we receive and error saying: `the paramaters should be a pandas.Dataframe` it’s happens because the first parameter of the function `batch_predict` is a pandas.Dataframe. To solve this problem we need to use de named parameters of python.

* Fix typo in GCS URI parameter [(#2459)](#2459)

* fix: fix tables notebook links and bugs [(#2601)](#2601)

* feat(tables): update samples to show explainability [(#2523)](#2523)

* show xai

* local feature importance

* use updated client

* use fixed library

* use new model

* Auto-update dependencies. [(#2005)](#2005)

* Auto-update dependencies.

* Revert update of appengine/flexible/datastore.

* revert update of appengine/flexible/scipy

* revert update of bigquery/bqml

* revert update of bigquery/cloud-client

* revert update of bigquery/datalab-migration

* revert update of bigtable/quickstart

* revert update of compute/api

* revert update of container_registry/container_analysis

* revert update of dataflow/run_template

* revert update of datastore/cloud-ndb

* revert update of dialogflow/cloud-client

* revert update of dlp

* revert update of functions/imagemagick

* revert update of functions/ocr/app

* revert update of healthcare/api-client/fhir

* revert update of iam/api-client

* revert update of iot/api-client/gcs_file_to_device

* revert update of iot/api-client/mqtt_example

* revert update of language/automl

* revert update of run/image-processing

* revert update of vision/automl

* revert update testing/requirements.txt

* revert update of vision/cloud-client/detect

* revert update of vision/cloud-client/product_search

* revert update of jobs/v2/api_client

* revert update of jobs/v3/api_client

* revert update of opencensus

* revert update of translate/cloud-client

* revert update to speech/cloud-client

Co-authored-by: Kurtis Van Gent <31518063+kurtisvg@users.noreply.github.com>
Co-authored-by: Doug Mahugh <dmahugh@gmail.com>

* Update dependency google-cloud-automl to v0.10.0 [(#3033)](#3033)

Co-authored-by: Bu Sun Kim <8822365+busunkim96@users.noreply.github.com>
Co-authored-by: Leah E. Cole <6719667+leahecole@users.noreply.github.com>

* Simplify noxfile setup. [(#2806)](#2806)

* chore(deps): update dependency requests to v2.23.0

* Simplify noxfile and add version control.

* Configure appengine/standard to only test Python 2.7.

* Update Kokokro configs to match noxfile.

* Add requirements-test to each folder.

* Remove Py2 versions from everything execept appengine/standard.

* Remove conftest.py.

* Remove appengine/standard/conftest.py

* Remove 'no-sucess-flaky-report' from pytest.ini.

* Add GAE SDK back to appengine/standard tests.

* Fix typo.

* Roll pytest to python 2 version.

* Add a bunch of testing requirements.

* Remove typo.

* Add appengine lib directory back in.

* Add some additional requirements.

* Fix issue with flake8 args.

* Even more requirements.

* Readd appengine conftest.py.

* Add a few more requirements.

* Even more Appengine requirements.

* Add webtest for appengine/standard/mailgun.

* Add some additional requirements.

* Add workaround for issue with mailjet-rest.

* Add responses for appengine/standard/mailjet.

Co-authored-by: Renovate Bot <bot@renovateapp.com>

* chore: some lint fixes [(#3750)](#3750)

* automl: tables code sample clean-up [(#3571)](#3571)

* delete unused tables_dataset samples

* delete args code associated with unused automl_tables samples

* delete tests associated with unused automl_tables samples

* restore get_dataset method/yargs without region tagging

* Restore update_dataset methodsa without region tagging

Co-authored-by: Takashi Matsuo <tmatsuo@google.com>
Co-authored-by: Leah E. Cole <6719667+leahecole@users.noreply.github.com>

* add example of creating AutoML Tables client with non-default endpoint ('new' sdk) [(#3929)](#3929)

* add example of creating client with non-default endpoint

* more test file cleanup

* move connectivity print stmt out of test fn

Co-authored-by: Leah E. Cole <6719667+leahecole@users.noreply.github.com>
Co-authored-by: Torry Yang <sirtorry@users.noreply.github.com>

* Replace GCLOUD_PROJECT with GOOGLE_CLOUD_PROJECT. [(#4022)](#4022)

* chore(deps): update dependency google-cloud-automl to v1 [(#4127)](#4127)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-cloud-automl](https://togithub.com/googleapis/python-automl) | major | `==0.10.0` -> `==1.0.1` |

---

### Release Notes

<details>
<summary>googleapis/python-automl</summary>

### [`v1.0.1`](https://togithub.com/googleapis/python-automl/blob/master/CHANGELOG.md#&#8203;101-httpswwwgithubcomgoogleapispython-automlcomparev100v101-2020-06-18)

[Compare Source](https://togithub.com/googleapis/python-automl/compare/v0.10.0...v1.0.1)

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples).

* [tables/automl] fix: update the csv file and the dataset name [(#4188)](#4188)

fixes #4177
fixes #4178

* samples: Automl table batch test [(#4267)](#4267)

* added rtest req.txt

* samples: added automl batch predict test

* added missing package

* Update tables/automl/batch_predict_test.py

Co-authored-by: Bu Sun Kim <8822365+busunkim96@users.noreply.github.com>

Co-authored-by: Bu Sun Kim <8822365+busunkim96@users.noreply.github.com>

* samples: fixed wrong format on GCS input Uri [(#4270)](#4270)

## Description

Current predict sample indicates that it can multiples GCS URI inputs but it should be singular.

## Checklist
- [X] Please **merge** this PR for me once it is approved.

* chore(deps): update dependency pytest to v5.4.3 [(#4279)](#4279)

* chore(deps): update dependency pytest to v5.4.3

* specify pytest for python 2 in appengine

Co-authored-by: Leah Cole <coleleah@google.com>

* Update automl_tables_predict.py with batch_predict_bq sample [(#4142)](#4142)

Added a new method `batch_predict_bq` demonstrating running batch_prediction using BigQuery.
Added notes in comments about asynchronicity for `batch_predict` method.

The region `automl_tables_batch_predict_bq` will be used on cloud.google.com (currently both sections for GCS and BigQuery use the same sample code which is incorrect).

Fixes #4141

Note: It's a good idea to open an issue first for discussion.

- [x] Please **merge** this PR for me once it is approved.

* Update dependency pytest to v6 [(#4390)](#4390)

* chore: exclude notebooks

* chore: update templates

* chore: add codeowners and fix tests

* chore: ignore warnings from sphinx

* chore: fix tables client

* test: fix unit tests

Co-authored-by: Torry Yang <sirtorry@users.noreply.github.com>
Co-authored-by: florencep <florenceperot@google.com>
Co-authored-by: Mike Burton <mb-github@niskala.org>
Co-authored-by: Lars Wander <lwander@users.noreply.github.com>
Co-authored-by: Michael Hu <Michael.an.hu@gmail.com>
Co-authored-by: Michael Hu <michaelanhu@gmail.com>
Co-authored-by: Alefh Sousa <alefh.sousa@gmail.com>
Co-authored-by: DPEBot <dpebot@google.com>
Co-authored-by: Kurtis Van Gent <31518063+kurtisvg@users.noreply.github.com>
Co-authored-by: Doug Mahugh <dmahugh@gmail.com>
Co-authored-by: WhiteSource Renovate <bot@renovateapp.com>
Co-authored-by: Leah E. Cole <6719667+leahecole@users.noreply.github.com>
Co-authored-by: Takashi Matsuo <tmatsuo@google.com>
Co-authored-by: Anthony <wens.ajw@gmail.com>
Co-authored-by: Amy <amy@infosleuth.net>
Co-authored-by: Mike <45373284+munkhuushmgl@users.noreply.github.com>
Co-authored-by: Leah Cole <coleleah@google.com>
Co-authored-by: Sergei Dorogin <github@dorogin.com>
  • Loading branch information
19 people authored and dandhlee committed Nov 17, 2022
1 parent 6576244 commit 7437390
Show file tree
Hide file tree
Showing 12 changed files with 1,648 additions and 0 deletions.
306 changes: 306 additions & 0 deletions automl/tables/automl_tables_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
#!/usr/bin/env python

# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""This application demonstrates how to perform basic operations on dataset
with the Google AutoML Tables API.
For more information, the documentation at
https://cloud.google.com/automl-tables/docs.
"""

import argparse
import os


def create_dataset(project_id, compute_region, dataset_display_name):
"""Create a dataset."""
# [START automl_tables_create_dataset]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# Create a dataset with the given display name
dataset = client.create_dataset(dataset_display_name)

# Display the dataset information.
print("Dataset name: {}".format(dataset.name))
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
print("Dataset display name: {}".format(dataset.display_name))
print("Dataset metadata:")
print("\t{}".format(dataset.tables_dataset_metadata))
print("Dataset example count: {}".format(dataset.example_count))
print("Dataset create time:")
print("\tseconds: {}".format(dataset.create_time.seconds))
print("\tnanos: {}".format(dataset.create_time.nanos))

# [END automl_tables_create_dataset]

return dataset


def list_datasets(project_id, compute_region, filter_=None):
"""List all datasets."""
result = []
# [START automl_tables_list_datasets]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# filter_ = 'filter expression here'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# List all the datasets available in the region by applying filter.
response = client.list_datasets(filter_=filter_)

print("List of datasets:")
for dataset in response:
# Display the dataset information.
print("Dataset name: {}".format(dataset.name))
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
print("Dataset display name: {}".format(dataset.display_name))
metadata = dataset.tables_dataset_metadata
print(
"Dataset primary table spec id: {}".format(
metadata.primary_table_spec_id
)
)
print(
"Dataset target column spec id: {}".format(
metadata.target_column_spec_id
)
)
print(
"Dataset target column spec id: {}".format(
metadata.target_column_spec_id
)
)
print(
"Dataset weight column spec id: {}".format(
metadata.weight_column_spec_id
)
)
print(
"Dataset ml use column spec id: {}".format(
metadata.ml_use_column_spec_id
)
)
print("Dataset example count: {}".format(dataset.example_count))
print("Dataset create time:")
print("\tseconds: {}".format(dataset.create_time.seconds))
print("\tnanos: {}".format(dataset.create_time.nanos))
print("\n")

# [END automl_tables_list_datasets]
result.append(dataset)

return result


def get_dataset(project_id, compute_region, dataset_display_name):
"""Get the dataset."""
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# Get complete detail of the dataset.
dataset = client.get_dataset(dataset_display_name=dataset_display_name)

# Display the dataset information.
print("Dataset name: {}".format(dataset.name))
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
print("Dataset display name: {}".format(dataset.display_name))
print("Dataset metadata:")
print("\t{}".format(dataset.tables_dataset_metadata))
print("Dataset example count: {}".format(dataset.example_count))
print("Dataset create time:")
print("\tseconds: {}".format(dataset.create_time.seconds))
print("\tnanos: {}".format(dataset.create_time.nanos))

return dataset


def import_data(project_id, compute_region, dataset_display_name, path):
"""Import structured data."""
# [START automl_tables_import_data]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME'
# path = 'gs://path/to/file.csv' or 'bq://project_id.dataset.table_id'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

response = None
if path.startswith("bq"):
response = client.import_data(
dataset_display_name=dataset_display_name, bigquery_input_uri=path
)
else:
# Get the multiple Google Cloud Storage URIs.
input_uris = path.split(",")
response = client.import_data(
dataset_display_name=dataset_display_name,
gcs_input_uris=input_uris,
)

print("Processing import...")
# synchronous check of operation status.
print("Data imported. {}".format(response.result()))

# [END automl_tables_import_data]


def update_dataset(
project_id,
compute_region,
dataset_display_name,
target_column_spec_name=None,
weight_column_spec_name=None,
test_train_column_spec_name=None,
):
"""Update dataset."""
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'
# target_column_spec_name = 'TARGET_COLUMN_SPEC_NAME_HERE' or None
# weight_column_spec_name = 'WEIGHT_COLUMN_SPEC_NAME_HERE' or None
# test_train_column_spec_name = 'TEST_TRAIN_COLUMN_SPEC_NAME_HERE' or None

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

if target_column_spec_name is not None:
response = client.set_target_column(
dataset_display_name=dataset_display_name,
column_spec_display_name=target_column_spec_name,
)
print("Target column updated. {}".format(response))
if weight_column_spec_name is not None:
response = client.set_weight_column(
dataset_display_name=dataset_display_name,
column_spec_display_name=weight_column_spec_name,
)
print("Weight column updated. {}".format(response))
if test_train_column_spec_name is not None:
response = client.set_test_train_column(
dataset_display_name=dataset_display_name,
column_spec_display_name=test_train_column_spec_name,
)
print("Test/train column updated. {}".format(response))


def delete_dataset(project_id, compute_region, dataset_display_name):
"""Delete a dataset"""
# [START automl_tables_delete_dataset]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# Delete a dataset.
response = client.delete_dataset(dataset_display_name=dataset_display_name)

# synchronous check of operation status.
print("Dataset deleted. {}".format(response.result()))
# [END automl_tables_delete_dataset]


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter,
)
subparsers = parser.add_subparsers(dest="command")

create_dataset_parser = subparsers.add_parser(
"create_dataset", help=create_dataset.__doc__
)
create_dataset_parser.add_argument("--dataset_name")

list_datasets_parser = subparsers.add_parser(
"list_datasets", help=list_datasets.__doc__
)
list_datasets_parser.add_argument("--filter_")

get_dataset_parser = subparsers.add_parser(
"get_dataset", help=get_dataset.__doc__
)
get_dataset_parser.add_argument("--dataset_display_name")

import_data_parser = subparsers.add_parser(
"import_data", help=import_data.__doc__
)
import_data_parser.add_argument("--dataset_display_name")
import_data_parser.add_argument("--path")

update_dataset_parser = subparsers.add_parser(
"update_dataset", help=update_dataset.__doc__
)
update_dataset_parser.add_argument("--dataset_display_name")
update_dataset_parser.add_argument("--target_column_spec_name")
update_dataset_parser.add_argument("--weight_column_spec_name")
update_dataset_parser.add_argument("--ml_use_column_spec_name")

delete_dataset_parser = subparsers.add_parser(
"delete_dataset", help=delete_dataset.__doc__
)
delete_dataset_parser.add_argument("--dataset_display_name")

project_id = os.environ["PROJECT_ID"]
compute_region = os.environ["REGION_NAME"]

args = parser.parse_args()
if args.command == "create_dataset":
create_dataset(project_id, compute_region, args.dataset_name)
if args.command == "list_datasets":
list_datasets(project_id, compute_region, args.filter_)
if args.command == "get_dataset":
get_dataset(project_id, compute_region, args.dataset_display_name)
if args.command == "import_data":
import_data(
project_id, compute_region, args.dataset_display_name, args.path
)
if args.command == "update_dataset":
update_dataset(
project_id,
compute_region,
args.dataset_display_name,
args.target_column_spec_name,
args.weight_column_spec_name,
args.ml_use_column_spec_name,
)
if args.command == "delete_dataset":
delete_dataset(project_id, compute_region, args.dataset_display_name)
Loading

0 comments on commit 7437390

Please sign in to comment.