11# Run template
22
3- [ ` main.py ` ] ( main.py ) - Script to run an [ Apache Beam ] template on [ Google Cloud Dataflow ] .
3+ [ ![ Open in Cloud Shell ] ( http://gstatic.com/cloudssh/images/open-btn.svg )] ( https://console.cloud.google.com/cloudshell/editor )
44
5- The following examples show how to run the [ ` Word_Count ` template] , but you can run any other template.
5+ This sample demonstrate how to run an
6+ [ Apache Beam] ( https://beam.apache.org/ )
7+ template on [ Google Cloud Dataflow] ( https://cloud.google.com/dataflow/docs/ ) .
8+ For more information, see the
9+ [ Running templates] ( https://cloud.google.com/dataflow/docs/guides/templates/running-templates )
10+ docs page.
611
7- For the ` Word_Count ` template, we require to pass an ` output ` Cloud Storage path prefix, and optionally we can pass an ` inputFile ` Cloud Storage file pattern for the inputs.
12+ The following examples show how to run the
13+ [ ` Word_Count ` template] ( https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/WordCount.java ) ,
14+ but you can run any other template.
15+
16+ For the ` Word_Count ` template, we require to pass an ` output ` Cloud Storage path prefix,
17+ and optionally we can pass an ` inputFile ` Cloud Storage file pattern for the inputs.
818If ` inputFile ` is not passed, it will take ` gs://apache-beam-samples/shakespeare/kinglear.txt ` as default.
919
1020## Before you begin
1121
12- 1 . Install the [ Cloud SDK] .
13-
14- 1 . [ Create a new project] .
15-
16- 1 . [ Enable billing] .
17-
18- 1 . [ Enable the APIs] ( https://console.cloud.google.com/flows/enableapi?apiid=dataflow,compute_component,logging,storage_component,storage_api,bigquery,pubsub,datastore.googleapis.com,cloudfunctions.googleapis.com,cloudresourcemanager.googleapis.com ) : Dataflow, Compute Engine, Stackdriver Logging, Cloud Storage, Cloud Storage JSON, BigQuery, Pub/Sub, Datastore, Cloud Functions, and Cloud Resource Manager.
19-
20- 1 . Setup the Cloud SDK to your GCP project.
21-
22- ``` bash
23- gcloud init
24- ```
22+ Follow the
23+ [ Getting started with Google Cloud Dataflow] ( ../README.md )
24+ page, and make sure you have a Google Cloud project with billing enabled
25+ and a * service account JSON key* set up in your ` GOOGLE_APPLICATION_CREDENTIALS ` environment variable.
26+ Additionally, for this sample you need the following:
2527
26281 . Create a Cloud Storage bucket.
2729
28- ``` bash
29- gsutil mb gs://your-gcs-bucket
30+ ``` sh
31+ export BUCKET=your-gcs-bucket
32+ gsutil mb gs://$BUCKET
3033 ```
3134
32- ## Setup
33-
34- The following instructions will help you prepare your development environment.
35-
36- 1 . [ Install Python and virtualenv] .
37-
38351 . Clone the ` python-docs-samples ` repository.
3936
40- ``` bash
41- git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
42- ```
37+ ``` sh
38+ git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
39+ ```
4340
44411 . Navigate to the sample code directory.
4542
46- ` ` ` bash
43+ ``` sh
4744 cd python-docs-samples/dataflow/run_template
4845 ```
4946
50471 . Create a virtual environment and activate it.
5148
52- ` ` ` bash
49+ ``` sh
5350 virtualenv env
5451 source env/bin/activate
5552 ```
@@ -58,18 +55,18 @@ The following instructions will help you prepare your development environment.
5855
59561 . Install the sample requirements.
6057
61- ` ` ` bash
58+ ``` sh
6259 pip install -U -r requirements.txt
6360 ```
6461
6562## Running locally
6663
67- To run a Dataflow template from the command line.
64+ * [ ` main.py ` ] ( main.py )
65+ * [ REST API dataflow/projects.templates.launch] ( https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.templates/launch )
6866
69- > NOTE: To run locally, you' ll need to [create a service account key] as a JSON file.
70- > Then export an environment variable called `GOOGLE_APPLICATION_CREDENTIALS` pointing it to your service account file.
67+ To run a Dataflow template from the command line.
7168
72- ```bash
69+ ``` sh
7370python main.py \
7471 --project < your-gcp-project> \
7572 --job wordcount-$( date +' %Y%m%d-%H%M%S' ) \
@@ -80,10 +77,10 @@ python main.py \
8077
8178## Running in Python
8279
83- To run a Dataflow template from Python.
80+ * [ ` main.py ` ] ( main.py )
81+ * [ REST API dataflow/projects.templates.launch] ( https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.templates/launch )
8482
85- > NOTE: To run locally, you' ll need to [create a service account key] as a JSON file.
86- > Then export an environment variable called ` GOOGLE_APPLICATION_CREDENTIALS` pointing it to your service account file.
83+ To run a Dataflow template from Python.
8784
8885``` py
8986import main as run_template
@@ -101,9 +98,12 @@ run_template.run(
10198
10299## Running in Cloud Functions
103100
101+ * [ ` main.py ` ] ( main.py )
102+ * [ REST API dataflow/projects.templates.launch] ( https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.templates/launch )
103+
104104To deploy this into a Cloud Function and run a Dataflow template via an HTTP request as a REST API.
105105
106- ` ` ` bash
106+ ``` sh
107107PROJECT=$( gcloud config get-value project) \
108108REGION=$( gcloud config get-value functions/region)
109109
@@ -121,17 +121,3 @@ curl -X POST "https://$REGION-$PROJECT.cloudfunctions.net/run_template" \
121121 -d inputFile=gs://apache-beam-samples/shakespeare/kinglear.txt \
122122 -d output=gs://< your-gcs-bucket> /wordcount/outputs
123123```
124-
125- [Apache Beam]: https://beam.apache.org/
126- [Google Cloud Dataflow]: https://cloud.google.com/dataflow/docs/
127- [` Word_Count` template]: https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/WordCount.java
128-
129- [Cloud SDK]: https://cloud.google.com/sdk/docs/
130- [Create a new project]: https://console.cloud.google.com/projectcreate
131- [Enable billing]: https://cloud.google.com/billing/docs/how-to/modify-project
132- [Create a service account key]: https://console.cloud.google.com/apis/credentials/serviceaccountkey
133- [Creating and managing service accounts]: https://cloud.google.com/iam/docs/creating-managing-service-accounts
134- [GCP Console IAM page]: https://console.cloud.google.com/iam-admin/iam
135- [Granting roles to service accounts]: https://cloud.google.com/iam/docs/granting-roles-to-service-accounts
136-
137- [Install Python and virtualenv]: https://cloud.google.com/python/setup
0 commit comments