A Terraform module which deploys Snowplow Enrich service on EC2. If you want to use a custom AMI for this deployment you will need to ensure it is based on top of Amazon Linux 2.
This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.
If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id
variable to include a valid email address which we can reach you at.
To disable telemetry simply set variable telemetry_enabled = false
.
For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry
Stream Enrich takes data from a raw input stream and pushes validated data to the enriched stream and failed data to the bad stream. As part of this validation process we leverage Iglu which is Snowplow's schema repository - the home for event and entity definitions. If you are using custom events that you have defined yourself you will need to ensure that you link in your own Iglu Registries to this module so that they can be discovered correctly.
By default this module enables 5 enrichments which you can find in the templates/enrichments
directory of this module.
module "raw_stream" {
source = "snowplow-devops/kinesis-stream/aws"
version = "0.2.0"
name = "raw-stream"
}
module "enriched_stream" {
source = "snowplow-devops/kinesis-stream/aws"
version = "0.2.0"
name = "enriched-stream"
}
module "bad_1_stream" {
source = "snowplow-devops/kinesis-stream/aws"
version = "0.2.0"
name = "bad-1-stream"
}
module "enrich_kinesis" {
source = "snowplow-devops/enrich-kinesis-ec2/aws"
name = "enrich-server"
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
in_stream_name = module.raw_stream.name
enriched_stream_name = module.enriched_stream.name
bad_stream_name = module.bad_1_stream.name
ssh_key_name = "your-key-name"
ssh_ip_allowlist = ["0.0.0.0/0"]
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
}
To define your own enrichment configurations you will need to provide a JSON encoded string of the enrichment in the appropriate placeholder.
locals {
enrichment_anon_ip = jsonencode(<<EOF
{
"schema": "iglu:com.snowplowanalytics.snowplow/anon_ip/jsonschema/1-0-1",
"data": {
"name": "anon_ip",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"anonOctets": 1,
"anonSegments": 1
}
}
}
EOF
)
}
module "enrich_kinesis" {
source = "snowplow-devops/enrich-kinesis-ec2/aws"
name = "enrich-server"
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
in_stream_name = module.raw_stream.name
enriched_stream_name = module.enriched_stream.name
bad_stream_name = module.bad_1_stream.name
ssh_key_name = "your-key-name"
ssh_ip_allowlist = ["0.0.0.0/0"]
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
# Enable this enrichment
enrichment_anon_ip = local.enrichment_anon_ip
}
As with inserting custom enrichments to disable the default enrichments a similar strategy must be employed. For example to disable YAUAA you would do the following.
locals {
enrichment_yauaa = jsonencode(<<EOF
{
"schema": "iglu:com.snowplowanalytics.snowplow.enrichments/yauaa_enrichment_config/jsonschema/1-0-0",
"data": {
"enabled": false,
"vendor": "com.snowplowanalytics.snowplow.enrichments",
"name": "yauaa_enrichment_config"
}
}
EOF
)
}
module "enrich_kinesis" {
source = "snowplow-devops/enrich-kinesis-ec2/aws"
name = "enrich-server"
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
in_stream_name = module.raw_stream.name
enriched_stream_name = module.enriched_stream.name
bad_stream_name = module.bad_1_stream.name
ssh_key_name = "your-key-name"
ssh_ip_allowlist = ["0.0.0.0/0"]
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
# Disable this enrichment
enrichment_yauaa_enrichment_config = local.enrichment_yauaa
}
Name | Version |
---|---|
terraform | >= 1.0.0 |
aws | >= 3.45.0 |
Name | Version |
---|---|
aws | >= 3.45.0 |
Name | Source | Version |
---|---|---|
config_autoscaling | snowplow-devops/dynamodb-autoscaling/aws | 0.2.0 |
instance_type_metrics | snowplow-devops/ec2-instance-type-metrics/aws | 0.1.2 |
kcl_autoscaling | snowplow-devops/dynamodb-autoscaling/aws | 0.2.0 |
tags | snowplow-devops/tags/aws | 0.2.0 |
telemetry | snowplow-devops/telemetry/snowplow | 0.3.0 |
Name | Description | Type | Default | Required |
---|---|---|---|---|
bad_stream_name | The name of the bad kinesis stream that the Enricher will insert bad data into | string |
n/a | yes |
enriched_stream_name | The name of the enriched kinesis stream that the Enricher will insert validated data into | string |
n/a | yes |
in_stream_name | The name of the input kinesis stream that the Enricher will pull data from | string |
n/a | yes |
name | A name which will be pre-pended to the resources created | string |
n/a | yes |
ssh_key_name | The name of the SSH key-pair to attach to all EC2 nodes deployed | string |
n/a | yes |
subnet_ids | The list of subnets to deploy Enrich across | list(string) |
n/a | yes |
vpc_id | The VPC to deploy Enrich within (must have DNS hostnames enabled) | string |
n/a | yes |
amazon_linux_2_ami_id | The AMI ID to use which must be based of of Amazon Linux 2; by default the latest community version is used | string |
"" |
no |
assets_update_period | Period after which enrich assets should be checked for updates (e.g. MaxMind DB) | string |
"7 days" |
no |
associate_public_ip_address | Whether to assign a public ip address to this instance | bool |
true |
no |
byte_limit | The amount of bytes to buffer events before pushing them to Kinesis | number |
1000000 |
no |
cloudwatch_logs_enabled | Whether application logs should be reported to CloudWatch | bool |
true |
no |
cloudwatch_logs_retention_days | The length of time in days to retain logs for | number |
7 |
no |
custom_iglu_resolvers | The custom Iglu Resolvers that will be used by Enrichment to resolve and validate events | list(object({ |
[] |
no |
custom_s3_hosted_assets_bucket_name | Name of the bucket in which hosted database for the IP Lookups and/or IAB Enrichments are stored | string |
"" |
no |
custom_tcp_egress_port_list | For opening up TCP ports to access other destinations not served over HTTP(s) (e.g. for SQL / API enrichments) | list(string) |
[] |
no |
default_iglu_resolvers | The default Iglu Resolvers that will be used by Enrichment to resolve and validate events | list(object({ |
[ |
no |
enrichment_anon_ip | n/a | string |
"" |
no |
enrichment_api_request_enrichment_config | n/a | string |
"" |
no |
enrichment_campaign_attribution | n/a | string |
"" |
no |
enrichment_cookie_extractor_config | n/a | string |
"" |
no |
enrichment_currency_conversion_config | n/a | string |
"" |
no |
enrichment_event_fingerprint_config | n/a | string |
"" |
no |
enrichment_http_header_extractor_config | n/a | string |
"" |
no |
enrichment_iab_spiders_and_bots_enrichment | Note: Requires paid database to function | string |
"" |
no |
enrichment_ip_lookups | Note: Requires free or paid subscription to database to function | string |
"" |
no |
enrichment_javascript_script_config | n/a | string |
"" |
no |
enrichment_pii_enrichment_config | n/a | string |
"" |
no |
enrichment_referer_parser | n/a | string |
"" |
no |
enrichment_sql_query_enrichment_config | n/a | string |
"" |
no |
enrichment_ua_parser_config | n/a | string |
"" |
no |
enrichment_weather_enrichment_config | n/a | string |
"" |
no |
enrichment_yauaa_enrichment_config | n/a | string |
"" |
no |
iam_permissions_boundary | The permissions boundary ARN to set on IAM roles created | string |
"" |
no |
initial_position | Where to start processing the input Kinesis Stream from (TRIM_HORIZON or LATEST) | string |
"TRIM_HORIZON" |
no |
instance_type | The instance type to use | string |
"t3a.small" |
no |
java_opts | Custom JAVA Options | string |
"-Dorg.slf4j.simpleLogger.defaultLogLevel=info -XX:MinRAMPercentage=50 -XX:MaxRAMPercentage=75" |
no |
kcl_read_max_capacity | The maximum READ capacity for the KCL DynamoDB table | number |
10 |
no |
kcl_read_min_capacity | The minimum READ capacity for the KCL DynamoDB table | number |
1 |
no |
kcl_write_max_capacity | The maximum WRITE capacity for the KCL DynamoDB table | number |
10 |
no |
kcl_write_min_capacity | The minimum WRITE capacity for the KCL DynamoDB table | number |
1 |
no |
max_size | The maximum number of servers in this server-group | number |
2 |
no |
min_size | The minimum number of servers in this server-group | number |
1 |
no |
record_limit | The number of events to buffer before pushing them to Kinesis | number |
500 |
no |
ssh_ip_allowlist | The list of CIDR ranges to allow SSH traffic from | list(any) |
[ |
no |
tags | The tags to append to this resource | map(string) |
{} |
no |
telemetry_enabled | Whether or not to send telemetry information back to Snowplow Analytics Ltd | bool |
true |
no |
time_limit_ms | The amount of time to buffer events before pushing them to Kinesis | number |
500 |
no |
user_provided_id | An optional unique identifier to identify the telemetry events emitted by this stack | string |
"" |
no |
Name | Description |
---|---|
asg_id | ID of the ASG |
asg_name | Name of the ASG |
sg_id | ID of the security group attached to the Enrich servers |
The Terraform AWS Enrich Kinesis on EC2 project is Copyright 2021-2022 Snowplow Analytics Ltd.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.