terraform-google-postgres-loader-pubsub-ce

A Terraform module which deploys a Snowplow Postgres Loader application on Google running on top of Compute Engine. If you want to use a custom image for this deployment you will need to ensure it is based on top of Ubuntu 20.04.

WARNING: If you are upgrading from module version 0.1.x you will need to issue a manual table update - details can be found here. You will need to adjust the alter table command with the schema that your events table is deployed within.

Telemetry

This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.

If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.

How do I disable it?

To disable telemetry simply set variable telemetry_enabled = false.

What are you collecting?

For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry

Usage

The Postgres Loader can load both your enriched and bad data into a Postgres database - by default we are using CloudSQL as it affords a simple and cost effective way to get started.

To start loading "enriched" data into Postgres:

module "enriched_topic" {
  source  = "snowplow-devops/pubsub-topic/google"
  version = "0.3.0"

  name = "enriched-topic"
}

module "pipeline_db" {
  source  = "snowplow-devops/cloud-sql/google"
  version = "0.3.0"

  name = "pipeline-db"

  region      = var.region
  db_name     = local.pipeline_db_name
  db_username = local.pipeline_db_username
  db_password = local.pipeline_db_password

  # Note: this exposes your data to the internet - take care to ensure your allowlist is strict enough
  authorized_networks = local.pipeline_authorized_networks

  # Note: required for higher concurrent connections count which is neccesary for loading both good and bad data at the same time
  tier = "db-g1-small"
}

module "postgres_loader_enriched" {
  source = "snowplow-devops/postgres-loader-pubsub-ce/google"

  accept_limited_use_license = true

  name = "pg-loader-enriched-server"

  network    = var.network
  subnetwork = var.subnetwork
  region     = var.region
  project_id = var.project_id

  ssh_key_pairs    = []
  ssh_ip_allowlist = ["0.0.0.0/0"]

  in_topic_name = module.enriched_topic.name
  purpose       = "ENRICHED_EVENTS"
  schema_name   = "atomic"

  # Note: Using the connection_name will enforce the use of a Cloud SQL Proxy rather than a direct connection
  #       To instead use a direct connection you will need to define the `db_host` parameter instead.
  db_instance_name = module.pipeline_db.connection_name
  db_port          = module.pipeline_db.port
  db_name          = local.pipeline_db_name
  db_username      = local.pipeline_db_username
  db_password      = local.pipeline_db_password

  # Linking in the custom Iglu Server here
  custom_iglu_resolvers = [
    {
      name            = "Iglu Server"
      priority        = 0
      uri             = "http://your-iglu-server-endpoint/api"
      api_key         = var.iglu_super_api_key
      vendor_prefixes = []
    }
  ]
}

To load the "bad" data instead:

module "bad_1_topic" {
  source  = "snowplow-devops/pubsub-topic/google"
  version = "0.3.0"

  name = "bad-1-topic"
}

module "postgres_loader_bad" {
  source = "snowplow-devops/postgres-loader-pubsub-ce/google"

  accept_limited_use_license = true

  name = "pg-loader-bad-server"

  network    = var.network
  subnetwork = var.subnetwork
  region     = var.region
  project_id = var.project_id

  ssh_key_pairs    = []
  ssh_ip_allowlist = ["0.0.0.0/0"]

  in_topic_name = module.bad_1_topic.name

  # Note: The purpose defines what the input data set should look like
  purpose = "JSON"

  # Note: This schema is created automatically by the VM on launch
  schema_name = "atomic_bad"

  # Note: Using the connection_name will enforce the use of a Cloud SQL Proxy rather than a direct connection
  #       To instead use a direct connection you will need to define the `db_host` parameter instead.
  db_instance_name = module.pipeline_db.connection_name
  db_port          = module.pipeline_db.port
  db_name          = local.pipeline_db_name
  db_username      = local.pipeline_db_username
  db_password      = local.pipeline_db_password

  # Linking in the custom Iglu Server here
  custom_iglu_resolvers = [
    {
      name            = "Iglu Server"
      priority        = 0
      uri             = "http://your-iglu-server-endpoint/api"
      api_key         = var.iglu_super_api_key
      vendor_prefixes = []
    }
  ]
}

Requirements

Name	Version
terraform	>= 1.0.0
google	>= 3.44.0

Providers

Name	Version
google	>= 3.44.0

Modules

Name	Source	Version
service	snowplow-devops/service-ce/google	0.1.0
telemetry	snowplow-devops/telemetry/snowplow	0.5.0

Resources

Name	Type
google_compute_firewall.egress	resource
google_compute_firewall.ingress_ssh	resource
google_project_iam_member.sa_cloud_sql_client	resource
google_project_iam_member.sa_logging_log_writer	resource
google_project_iam_member.sa_pubsub_publisher	resource
google_project_iam_member.sa_pubsub_subscriber	resource
google_project_iam_member.sa_pubsub_viewer	resource
google_pubsub_subscription.in	resource
google_service_account.sa	resource

Inputs

Name	Description	Type	Default	Required
db_name	The name of the database to connect to	`string`	n/a	yes
db_password	The password to use to connect to the database	`string`	n/a	yes
db_port	The port the database is running on	`number`	n/a	yes
db_username	The username to use to connect to the database	`string`	n/a	yes
in_topic_name	The name of the input pubsub topic that the loader will pull data from	`string`	n/a	yes
name	A name which will be pre-pended to the resources created	`string`	n/a	yes
network	The name of the network to deploy within	`string`	n/a	yes
project_id	The project ID in which the stack is being deployed	`string`	n/a	yes
purpose	The type of data the loader will be pulling which can be one of ENRICHED_EVENTS or JSON (Note: JSON can be used for loading bad rows)	`string`	n/a	yes
region	The name of the region to deploy within	`string`	n/a	yes
schema_name	The database schema to load data into (e.g atomic \| atomic_bad)	`string`	n/a	yes
accept_limited_use_license	Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/)	`bool`	`false`	no
app_version	App version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value.	`string`	`"0.3.1"`	no
associate_public_ip_address	Whether to assign a public ip address to this instance; if false this instance must be behind a Cloud NAT to connect to the internet	`bool`	`true`	no
custom_iglu_resolvers	The custom Iglu Resolvers that will be used by the loader to resolve and validate events	list(object({ name = string priority = number uri = string api_key = string vendor_prefixes = list(string) }))	`[]`	no
db_host	The hostname of the database to connect to (Note: if db_instance_name is non-empty this setting is ignored)	`string`	`""`	no
db_instance_name	The instance name of the CloudSQL instance to connect to (Note: if set db_host will be ignored and a proxy established instead)	`string`	`""`	no
db_max_connections	The maximum number of connections to the backing database	`number`	`10`	no
default_iglu_resolvers	The default Iglu Resolvers that will be used by the loader to resolve and validate events	list(object({ name = string priority = number uri = string api_key = string vendor_prefixes = list(string) }))	[ { "api_key": "", "name": "Iglu Central", "priority": 10, "uri": "http://iglucentral.com", "vendor_prefixes": [] }, { "api_key": "", "name": "Iglu Central - Mirror 01", "priority": 20, "uri": "http://mirror01.iglucentral.com", "vendor_prefixes": [] } ]	no
gcp_logs_enabled	Whether application logs should be reported to GCP Logging	`bool`	`true`	no
in_max_concurrent_checkpoints	The maximum number of concurrent effects for the topic checkpointing system - essentially how many concurrent acks we will make to PubSub	`number`	`100`	no
java_opts	Custom JAVA Options	`string`	`"-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75"`	no
labels	The labels to append to this resource	`map(string)`	`{}`	no
machine_type	The machine type to use	`string`	`"e2-small"`	no
network_project_id	The project ID of the shared VPC in which the stack is being deployed	`string`	`""`	no
ssh_block_project_keys	Whether to block project wide SSH keys	`bool`	`true`	no
ssh_ip_allowlist	The list of CIDR ranges to allow SSH traffic from	`list(any)`	[ "0.0.0.0/0" ]	no
ssh_key_pairs	The list of SSH key-pairs to add to the servers	list(object({ user_name = string public_key = string }))	`[]`	no
subnetwork	The name of the sub-network to deploy within; if populated will override the 'network' setting	`string`	`""`	no
target_size	The number of servers to deploy	`number`	`1`	no
telemetry_enabled	Whether or not to send telemetry information back to Snowplow Analytics Ltd	`bool`	`true`	no
ubuntu_20_04_source_image	The source image to use which must be based of of Ubuntu 20.04; by default the latest community version is used	`string`	`""`	no
user_provided_id	An optional unique identifier to identify the telemetry events emitted by this stack	`string`	`""`	no

Outputs

Name	Description
instance_group_url	The full URL of the instance group created by the manager
manager_id	Identifier for the instance group manager
manager_self_link	The URL for the instance group manager

Copyright and license

Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
templates		templates
.gitignore		.gitignore
CHANGELOG		CHANGELOG
LICENSE		LICENSE
README.md		README.md
main.tf		main.tf
outputs.tf		outputs.tf
variables.tf		variables.tf
versions.tf		versions.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

terraform-google-postgres-loader-pubsub-ce

Telemetry

How do I disable it?

What are you collecting?

Usage

Requirements

Providers

Modules

Resources

Inputs

Outputs

Copyright and license

About

Releases 6

Packages

Contributors 2

Languages

License

snowplow-devops/terraform-google-postgres-loader-pubsub-ce

Folders and files

Latest commit

History

Repository files navigation

terraform-google-postgres-loader-pubsub-ce

Telemetry

How do I disable it?

What are you collecting?

Usage

Requirements

Providers

Modules

Resources

Inputs

Outputs

Copyright and license

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 2

Languages

Packages