Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated google cloud function documentation #2034

Merged
merged 1 commit into from
Nov 7, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,31 +6,24 @@ keywords: [how to, deploy a pipeline, Cloud Function]

# Deploy a pipeline with Google Cloud Functions

This guide shows you how to deploy a pipeline using the gcloud shell and `dlt` CLI commands. To deploy a pipeline using this method, you must have a working knowledge of GCP and its associated services, such as Cloud Functions, Cloud Source Repositories, Shell Editor, IAM and permissions, and GCP service accounts.
This guide shows you how to deploy a pipeline using the gcloud shell and dlt CLI commands. To deploy a pipeline using this method, you must have a working knowledge of GCP and its associated services, such as Cloud Functions, IAM and permissions, and GCP service accounts.
hibajamal marked this conversation as resolved.
Show resolved Hide resolved

To deploy a pipeline using GCP Cloud Functions, you'll first need to set up an empty repo in Cloud Source Repositories, a service provided by GCP for hosting repositories, or you can clone it to your local machine and then deploy it using the Google Cloud CLI.
To deploy a pipeline with GCP Cloud Functions, navigate to the directory on your local machine or cloud repository (e.g., GitHub, Bitbucket) from where the function code is to be deployed.

## 1. Setup pipeline in Google Cloud Repositories
## 1. Setup pipeline

To deploy the pipeline, we'll use the Google Cloud Source Repositories method.

1. Sign in to your GCP account and enable the Cloud Functions API.
1. To set up the environment, you can follow these steps:
- Create an empty repo in Cloud Source Repositories.
- After creating the repo, click Edit repo to open it in a "Shell Editor".
- You can also skip creating the repo and use the Shell Editor directly, depending on your requirements.
1. In this guide, we'll be setting up the `dlt`
1. In this guide, we'll be setting up the dlt
[Notion verified source](../../dlt-ecosystem/verified-sources/notion). However, you can use any verified source or create a custom one to suit your needs.
1. In the Shell Editor:
1. In the terminal:
- Run the following command to initialize the verified source with Notion and create a pipeline example with BigQuery as the target.

```sh
dlt init notion bigquery
```

- After the command is executed, new files and folders with the necessary configurations are created in the main directory where the command was executed.
- After the command executes, new files and folders with the necessary configurations are created in the main directory where the command was executed.

- Detailed information about initializing a verified source and a pipeline example can be found in the `dlthub` [documentation](../../dlt-ecosystem/verified-sources/notion).
- Detailed information about initializing a verified source and a pipeline example can be found in the dlthub [documentation](../../dlt-ecosystem/verified-sources/notion).
1. Create a new Python file called "main.py" in the main directory. The file can be configured as follows:
```py
from notion_pipeline import load_databases
Expand All @@ -39,19 +32,20 @@ To deploy the pipeline, we'll use the Google Cloud Source Repositories method.
load_databases()
return "Pipeline run successfully!"
```
By default, Google Cloud Functions looks for the main.py file in the main directory, and we called the `load_databases()` function from notion_pipeline.py as shown above.
1. If you need any additional dependencies, add them to `requirements.txt` that got created.
By default, Google Cloud Functions looks for the "main.py" file in the directory.

1. If you need any additional dependencies, add them to the "requirements.txt" that was created.

## 2. Deploying GCP Cloud Function

In a Shell Editor, navigate to the main directory where the "main.py" file is located and run the following command in the terminal:
In the terminal, navigate to the directory where the "main.py" file is located and run the following command in the terminal:

```sh
gcloud functions deploy pipeline_notion --runtime python310 \
--trigger-http --allow-unauthenticated --source . --timeout 300
```

- This command uses a function called "pipeline_notion" with Python 3.10 as the runtime environment, an HTTP trigger, and allows unauthenticated access. The source "." refers to all files in the directory. The timeout is set to 5 minutes (300 seconds).
- This command uses a function called `pipeline_notion` with Python 3.10 as the runtime environment, an HTTP trigger, and allows unauthenticated access. The source "." refers to all files in the directory. The timeout is set to 5 minutes (300 seconds). To learn more about deploying the cloud function, read the [documentation here.](https://cloud.google.com/functions/docs/deploy)
- If you are uploading a large number of files to the destination, you can increase this to 60 minutes for HTTP functions and 10 minutes for event-driven functions. To learn more about the function timeout, see the [documentation here](https://cloud.google.com/functions/docs/configuring/timeout).

> Your project has a default service account associated with the project ID. Please assign the `Cloud Functions Developer` role to the associated service account.
Expand All @@ -62,26 +56,26 @@ Environmental variables can be declared in the Cloud Function in two ways:

#### 3a. Directly in the function:

- Go to the Google Cloud Function and select the deployed function. Click 'EDIT'.
- Navigate to the 'BUILD' tab and click 'ADD VARIABLE' under 'BUILD ENVIRONMENTAL VARIABLE'.
- Go to the Google Cloud Function and select the deployed function. Click "EDIT".
- Navigate to the "BUILD" tab and click "ADD VARIABLE" under "BUILD ENVIRONMENTAL VARIABLE".
- Enter a name for the variable that corresponds to the argument required by the pipeline. Make sure
to capitalize the variable name if it is specified in "secrets.toml". For example, if the variable
name is `api_key`, set the variable name to "API_KEY".
name is `api_key`, set the variable name to `API_KEY`.
- Enter the value for the Notion API key.
- Click Next and deploy the function.

#### 3b. Use GCP Secret Manager:

- Go to the Google Cloud function and select the function you deployed. Click 'EDIT'.
- In the 'Runtime, Build, Connections and Security Settings' section, select 'Security and Images
Repo'.
- Click 'Add a secret reference' and select the secret you created, for example, 'notion_secret'.
- Set the 'Reference method' to 'Mounted as environment variable'.
- In the 'Environment Variable' field, enter the name of the environment variable that corresponds
- Go to the Google Cloud function and select the function you deployed. Click "EDIT".
- In the "Runtime, Build, Connections and Security Settings" section, select "Security and Images
Repo".
- Click "Add a secret reference" and select the secret you created, for example, "notion_secret".
- Set the "Reference method" to "Mounted as environment variable".
- In the "Environment Variable" field, enter the environment variable's name that corresponds
to the argument required by the pipeline. Remember to capitalize the variable name if it is
required by the pipeline and specified in secrets.toml. For example, if the variable name is
api_key, you would declare the environment variable as "API_KEY".
- Finally, click 'DEPLOY' to deploy the function. The HTTP trigger will now successfully execute the
`api_key`, you would declare the environment variable as `API_KEY`.
- Finally, click "DEPLOY" to deploy the function. The HTTP trigger will now successfully execute the
pipeline each time the URL is triggered.
- Assign the `Secret Manager Secret Accessor` role to the service account used to deploy the cloud
function. Typically, this is the default service account associated with the Google Project in
Expand All @@ -90,8 +84,8 @@ Environmental variables can be declared in the Cloud Function in two ways:
## 4. Monitor (and manually trigger) the cloud function

To manually trigger the created function, you can open the trigger URL created by the Cloud Function
in the address bar. The message "Pipeline run successfully!" would mean that the pipeline was
in the address bar. The message "Pipeline run successfully!" confirms that the pipeline was
successfully run and the data was successfully loaded into the destination.

That's it! Have fun using `dlt` in Google Cloud Functions!
That's it! Have fun using dlt in Google Cloud Functions!

Loading