From 1694532446a678d0c29cb070dc6a0428bd6935b3 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 11:18:30 -0500 Subject: [PATCH 01/34] RFC 340: Kinesis Firehose Delivery Stream L2 --- text/0340-firehose-l2.md | 727 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 727 insertions(+) create mode 100644 text/0340-firehose-l2.md diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md new file mode 100644 index 000000000..277030080 --- /dev/null +++ b/text/0340-firehose-l2.md @@ -0,0 +1,727 @@ +--- +rfc pr: [#xxx](https://github.com/aws/aws-cdk-rfcs/pull/xxx) <-- fill this after you've already created the PR +tracking issue: https://github.com/aws/aws-cdk-rfcs/issues/340 +--- + +# Kinesis Data Firehose Delivery Stream L2 + +The `aws-kinesisfirehose` construct library allows you to create Amazon Kinesis +Data Firehose delivery streams and destinations with just a few lines of +code. As with most construct libraries, you can also easily define permissions +and metrics using a simple API. + +The amount of effort needed to create these resources is about the same as doing +it using the AWS console and you can choose from any of the supported +destinations. + +## Working Backwards + +### CHANGELOG + +`feat(kinesisfirehose): DeliveryStream L2; S3, Elasticache, Redshift destinations` + +### README + +--- + +# Amazon Kinesis Data Firehose Construct Library + +This module is part of the [AWS Cloud Development Kit](https://github.com/aws/aws-cdk) +project. It allows you to create Kinesis Data Firehose delivery streams. Firehose is a +service for fully-managed delivery of real-time streaming data to storage services such as +Amazon S3, Amazon Redshift, Amazon Elasticsearch, Splunk, or any custom HTTP endpoint or +third-party services such as Datadog, Dynatrace, LogicMonitor, MongoDB, New Relic, and +Sumo Logic. + +The simplest specification of a delivery stream is to create an implementation of +`IDestination` and provide it to the constructor of the delivery stream. Supported +destinations are covered [below](#destinations). +```ts +const destination: firehose.IDestination = (void '...', { + bind(_scope: Construct, _options: firehose.DestinationBindOptions): firehose.DestinationConfig { + return { + properties: {}, + }; + } +}); +new firehose.DeliveryStream(this, 'Delivery Stream', { + destination: destination, +}); +``` + +## Destinations + +The following destinations are supported. See [@aws-cdk/aws-kinesisfirehose-destinations](../aws-kinesisfirehose-destinations) +for the implementations of these destinations. + +### S3 + +```ts +import * as s3 from '@aws-cdk/aws-s3'; +import * as firehose from '@aws-cdk/aws-kinesisfirehose'; + +const bucket = new s3.Bucket(this, 'Bucket'); + +const s3Destination = new firehosedestinations.S3Destination({ + bucket: bucket, +}); +new firehose.DeliveryStream(this, 'Delivery Stream', { + destination: s3Destination, +}); +``` + +S3 supports custom dynamic prefixes: +```ts fixture=with-bucket +const s3Destination = new firehosedestinations.S3Destination({ + bucket: bucket, + prefix: 'myFirehose/DeliveredYear=!{timestamp:yyyy}/anyMonth/rand=!{firehose:random-string}', + errorOutputPrefix: 'myFirehoseFailures/!{firehose:error-output-type}/!{timestamp:yyyy}/anyMonth/!{timestamp:dd}', +}); +``` +See: [Custom S3 Prefixes](https://docs.aws.amazon.com/firehose/latest/dev/s3-prefixes.html) in the *Firehose Developer Guide*. + +S3 supports converting record formats when delivering: +See: [Converting Input Record Format](https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html) in the *Firehose Developer Guide*. + +### Elasticsearch + +```ts +import * as es from '@aws-cdk/aws-elasticsearch'; +import * as firehose from '@aws-cdk/aws-kinesisfirehose'; + +const domain = new es.Domain(this, 'Domain', { + version: es.ElasticsearchVersion.V7_1, +}); + +new firehose.DeliveryStream(this, 'Delivery Stream', { + destination: new ElasticsearchDestination({ + domain: domain, + indexName: 'myindex', + }), +}); +``` + +### Redshift + +Firehose can deliver data to a table within a Redshift cluster, using an intermediate S3 +bucket and executing a Redshift `COPY` command. Redshift clusters must be placed in public +subnets within an VPC, must be marked as publicly accessible, and cannot provide a master +user password (it must be generated by the CDK). A Redshift user will be created within +the cluster for the exclusive use of Firehose, and a Redshift table with the provided +schema will be created within the provided database. + +```ts +import * as ec2 from '@aws-cdk/aws-ec2'; +import * as firehose from '@aws-cdk/aws-kinesisfirehose'; +import * as redshift from '@aws-cdk/aws-redshift'; +import { Duration, Size } from '@aws-cdk/core'; + +const vpc = new ec2.Vpc(this, 'Vpc'); +const database = 'my_db'; +const cluster = new redshift.Cluster(this, 'Cluster', { + vpc: vpc, + vpcSubnets: { + subnetType: ec2.SubnetType.PUBLIC, + }, + masterUser: { + masterUsername: 'master', + }, + defaultDatabaseName: database, + publiclyAccessible: true, +}); + +const redshiftDestination = new RedshiftDestination({ + cluster: cluster, + user: { + username: 'firehose', + }, + database: database, + tableName: 'firehose_test_table', + tableColumns: [ + { name: 'TICKER_SYMBOL', dataType: 'varchar(4)' }, + { name: 'SECTOR', dataType: 'varchar(16)' }, + { name: 'CHANGE', dataType: 'float' }, + { name: 'PRICE', dataType: 'float' }, + ], + copyOptions: 'json \'auto\'', +}); +new firehose.DeliveryStream(this, 'Delivery Stream', { + destination: redshiftDestination, +}); +``` + +### 3rd Party + +Third-party service providers such as Splunk, Datadog, Dynatrace, LogicMonitor, MongoDB, +New Relic, and Sumo Logic have integrated with AWS to allow users to configure their +service as a Firehose destination out of the box. + +These integrations have not been completed (see #1234), please use [custom HTTP endpoints](#custom-http-endpoint) +to integrate with 3rd party services. + +### Custom HTTP Endpoint + +Firehose supports writing data to any custom HTTP endpoint that conforms to the +[HTTP request/response schema]. Use the `HttpDestination` class to specify how Firehose +can reach your custom endpoint and any configuration that may be required. + +This integration has not been completed (see #1234). + +[HTTP request/response schema]: https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html + +## Sources + +Firehose supports two main methods of sourcing input data: Kinesis Data Streams and via a +"direct put". + +See: [Sending Data to a Delivery Stream](https://docs.aws.amazon.com/firehose/latest/dev/basic-write.html) +in the *Firehose Developer Guide*. + +### Kinesis Data Stream + +Firehose can read directly from a Kinesis data stream as a consumer of the data stream. +Configure this behaviour by providing a data stream in the `sourceStream` property when +constructing a delivery stream: + +```ts fixture=with-destination +import * as kinesis from '@aws-cdk/aws-kinesis'; +const sourceStream = new kinesis.Stream(stack, 'Source Stream'); +new firehose.DeliveryStream(stack, 'Delivery Stream', { + sourceStream: sourceStream, + destination: destination, +}); +``` + +### Direct Put + +If a source data stream is not provided, then Firehose assumes data will be provided via +"direct put", ie., by using a `PutRecord` or `PutRecordBatch` API call. There are a number +of ways of doing so, such as: +- Kinesis Agent: a standalone Java application that monitors and delivers files while + handling file rotation, checkpointing, and retries. See: [Writing to Firehose Using Kinesis Agent](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html) + in the *Firehose Developer Guide*. +- Firehose SDK: a general purpose solution that allows you to deliver data to Firehose + from anywhere using Java, .NET, Node.js, Python, or Ruby. See: [Writing to Firehose Using the AWS SDK](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-sdk.html) + in the *Firehose Developer Guide*. +- CloudWatch Logs: subscribe to a log group and receive filtered log events directly into + Firehose. See: [logs-destinations](../aws-logs-destinations). +- Eventbridge: add Firehose as an event rule target to send events to a delivery stream + based on the rule filtering. See: [events-targets](../aws-events-targets). +- SNS: add Firehose as a subscription to send all notifications from the topic to a + delivery stream. See: [sns-subscriptions](../aws-sns-subscriptions). +- IoT: add an action to an IoT rule to send various IoT information to a delivery stream + +## Server-side Encryption + +Enabling server-side encryption (SSE) requires Firehose to encrypt all data sent to +delivery stream when it is stored at rest. This means that data is encrypted before being +written to the storage layer and decrypted after it is received from the storage +layer. Firehose manages keys and cryptographic operations so that sources and destinations +do not need to, as the data is encrypted and decrypted at the boundaries of the Firehose +service. By default, delivery streams do not have SSE enabled. + +The Key Management Service (KMS) Customer Managed Key (CMK) used for SSE can either be AWS-owned or +customer-managed. AWS-owned CMKs are keys that an AWS service (in this case Firehose) owns +and manages for use in multiple AWS accounts. As a customer, you cannot view, use, track, or manage +these keys, and you are not charged for their use. On the other hand, +customer-managed CMKs are keys that are created and owned within a your account and +managed entirely by the you. As a customer, you are responsible for managing access, rotation, +aliases, and deletion for these keys, and you are changed for their use. See: [Customer master keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys) +in the *KMS Developer Guide*. + +```ts fixture=with-destination +// SSE with an AWS-owned CMK +new firehose.DeliveryStream(stack, 'Delivery Stream AWS Owned', { + encryption: firehose.StreamEncryption.AWS_OWNED, + destination: destination, +}); + +// SSE with an customer-managed CMK that is created automatically by the CDK +new firehose.DeliveryStream(stack, 'Delivery Stream Implicit Customer Managed', { + encryption: firehose.StreamEncryption.CUSTOMER_MANAGED, + destination: destination, +}); + +// SSE with an customer-managed CMK that is explicitly specified +import * as kms from '@aws-cdk/aws-kms'; +const key = new kms.Key(stack, 'Key'); +new firehose.DeliveryStream(stack, 'Delivery Stream Explicit Customer Managed', { + encryptionKey: key, + destination: destination, +}); +``` + +If a Kinesis data stream is configured as the source of a Firehose delivery stream, +Firehose no longer stores data at rest and all encryption is handled by Kinesis Data +Streams. Firehose receives unencrypted data from Kinesis Data Streams, buffers the data in +memory, and sends the data to destinations without ever writing the unencrypted data at +rest. Practically, this means that SSE should be specified on the Kinesis data stream when +it is used as the source of a Firehose delivery stream (and specifying SSE on the delivery +stream will cause an error). + +See: [Data Protection](https://docs.aws.amazon.com/firehose/latest/dev/encryption.html) in +the *Firehose Developer Guide*. + + +## Monitoring + +Firehose is integrated with CloudWatch, so you can monitor the performance of your +delivery streams via logs and metrics. + +### Logs + +Firehose will send logs to CloudWatch when data trasformation or data delivery fails. You +can provide a specific log group to specify where the CDK will create the log streams +where log events from Firehose will be sent. Logging for delivery streams is enabled by +default. + +```ts fixture=with-destination +// Logging enabled with a log group that is automatically created by the CDK +new firehose.DeliveryStream(stack, 'Delivery Stream Implicit Log Group', { + logging: true, + destination: destination, +}); + +// Logging enabled with a log group that is explicitly specified +import * as logs from '@aws-cdk/aws-logs'; +const logGroup = new logs.LogGroup(stack, 'Log Group'); +new firehose.DeliveryStream(stack, 'Delivery Stream Explicit Log Group', { + logGroup: logGroup, + destination: destination, +}); +``` + +See: [Monitoring using CloudWatch Logs](https://docs.aws.amazon.com/firehose/latest/dev/monitoring-with-cloudwatch-logs.html) +in the *Firehose Developer Guide*. + +### Metrics + +Firehose sends metrics to CloudWatch so that you can collect and analyze the performance +of the delivery stream, including data delivery, data ingestion, data transformation, +format conversion, API usage, encryption, and resource usage. You can then use CloudWatch +alarms to alert you, for example, when data freshness (the age of the oldest record in +Firehose) exceeds the buffering limit (indicating that data is not being delivered to your +destination), or when the rate of incoming records exceeds the limit of records per second +(indicating data is flowing into your delivery stream faster than Firehose is configured +to process it). + +CDK provides methods for accessing Firehose metrics with default configuration, such as +`metricIncomingBytes`, and `metricIncomingRecords` (see [`IDeliveryStream`](../lib/delivery-stream.ts) +for a full list). CDK also provides a generic `metric` method that can be used to produce +metric configurations for any metric provided by Firehose; the configurations are +pre-populated with the correct dimensions for the delivery stream. + +```ts fixture=with-delivery-stream +// TODO: confirm this is a valid alarm +import * as cloudwatch from '@aws-cdk/aws-cloudwatch'; +// Alarm that triggers when the per-second average of incoming bytes exceeds 90% of the current service limit +const incomingBytesPercentOfLimit = new cloudwatch.MathExpression({ + expression: 'incomingBytes / 300 / bytePerSecLimit', + usingMetrics: { + incomingBytes: deliveryStream.metricIncomingBytes({ statistic: 'sum' }), + bytePerSecLimit: deliveryStream.metric('BytesPerSecondLimit'), + }, +}); +new Alarm(stack, 'Alarm', { + metric: incomingBytesPercentOfLimit, + threshold: 0.9, + evaluationPeriods: 3, +}); +``` + +See: [Monitoring Using CloudWatch Metrics](https://docs.aws.amazon.com/firehose/latest/dev/monitoring-with-cloudwatch-metrics.html) +in the *Firehose Developer Guide*. + +## Compression + +Firehose can automatically compress your data when it is delivered to S3 as either a final +or an intermediary destination. Supported compression formats are: gzip, Snappy, +Hadoop-compatible Snappy, and ZIP, except for Redshift destinations, where Snappy +(regardless of Hadoop-compatibility) and ZIP are not supported. By default, data is +delivered to S3 without compression. + +```ts fixture=with-bucket +// Compress data delivered to S3 using Snappy +const s3Destination = new firehosedestinations.S3Destination({ + compression: firehose.Compression.SNAPPY, + bucket: bucket, +}); +new firehose.DeliveryStream(stack, 'Delivery Stream', { + destination: destination, +}); +``` + +## Buffering + +Firehose buffers incoming data before delivering to the specified destination. Firehose +will wait until the amount of incoming data has exceeded some threshold (the "buffer +size") or until the time since the last data delivery occurred exceeds some threshold (the +"buffer interval"), whichever happens first. You can configure these threshold based on +the capabilities of the destination and your use-case. By default, the buffer size is 3 +MiB and the buffer interval is 1 minute. + +```ts fixture=with-bucket +// Increase the buffer interval and size to 5 minutes and 3 MiB, respectively +import * as cdk from '@aws-cdk/core'; +const s3Destination = new firehosedestinations.S3Destination({ + bufferingInterval: cdk.Duration.minutes(5), + bufferingSize: cdk.Size.mebibytes(8), + bucket: bucket, +}); +new firehose.DeliveryStream(stack, 'Delivery Stream', { + destination: destination, +}); +``` + +## Backup + +Firehose can be configured to backup data to S3 that it attempted to deliver to the +configured destination. Backed up data can be all the data that Firehose attempted to +deliver or just data that Firehose failed to deliver (not available for Redshift and S3 +destinations). CDK can create a new S3 bucket where it will back up data or you can +provide a bucket where data will be backed up. You can also provide prefix under which +Firehose will place backed-up data within the bucket. By default, source data is not +backed up to S3. + +```ts fixture=with-domain +// Enable backup of all source records (to an S3 bucket created by CDK) +new firehose.DeliveryStream(this, 'Delivery Stream All', { + destination: new ElasticsearchDestination({ + domain: domain, + indexName: 'myindex', + backup: BackupMode.ALL, + }), +}); + +// Enable backup of only the source records that Firehose failed to deliver (to an S3 bucket created by CDK) +new firehose.DeliveryStream(this, 'Delivery Stream Failed', { + destination: new ElasticsearchDestination({ + domain: domain, + indexName: 'myindex', + backup: BackupMode.FAILED, + }), +}); + +// Explicitly provide an S3 bucket to which all source records will be backed up +import * as s3 from '@aws-cdk/aws-s3'; +const backupBucket = new s3.Bucket(stack, 'Bucket'); +new firehose.DeliveryStream(this, 'Delivery Stream All Explicit Bucket', { + destination: new ElasticsearchDestination({ + domain: domain, + indexName: 'myindex', + backupBucket: backupBucket, + }), +}); + +// Explicitly provide an S3 prefix under which all source records will be backed up +new firehose.DeliveryStream(this, 'Delivery Stream All Explicit Prefix', { + destination: new ElasticsearchDestination({ + domain: domain, + indexName: 'myindex', + backup: BackupMode.ALL, + backupPrefix: 'mybackup', + }), +}); +``` + +## Data Processing/Transformation + +Firehose supports transforming data before delivering it to destinations. To transform the +data, Firehose will call a Lambda function that you provide and deliver the data returned +in lieu of the source record. The function must return a result that contains records in a +specific format, including the following fields: +- `recordId` -- the ID of the input record that corresponds the results. +- `result` -- the status of the transformation of the record: "Ok" (success), "Dropped" + (not processed intentionally), or "ProcessingFailed" (not processed due to an error). +- `data` -- the transformed data, Base64-encoded. +The data is buffered up to 1 minute and up to 3 MiB by default before being sent to the +function, but can be configured using `bufferInterval` and `bufferSize` in the processor +configuration (see: [Buffering](#buffering)). If the function invocation fails due to a +network timeout or because of hitting a invocation limit, the invocation is retried 3 +times by default, but can be configured using `retries` in the processor configuration. By +default, no data processing occurs. + +```ts fixture=with-bucket +// Provide a Lambda function that will transform records before delivery, with custom +// buffering and retry configuration +import * as cdk from '@aws-cdk/core'; +import * as lambda from '@aws-cdk/aws-lambda'; +const lambdaFunction = new lambda.Function(stack, 'Processor', { + runtime: lambda.Runtime.NODEJS_12_X, + handler: 'index.handler', + code: lambda.Code.fromAsset(path.join(__dirname, 'process-records')), +}); +const s3Destination = new firehosedestinations.S3Destination({ + processors: [{ + lambdaFunction: lambdaFunction, + bufferingInterval: cdk.Duration.minutes(5), + bufferingSize: cdk.Size.mebibytes(5), + retries: 5, + }], + bucket: bucket, +}); +new firehose.DeliveryStream(stack, 'Delivery Stream', { + destination: destination, +}); +``` + +See: [Data Transformation](https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html) +in the *Firehose Developer Guide*. + +This feature has not been completed (see #1234). + +--- + +## FAQ + +### What are we launching today? + +We are launching a new module (`@aws-cdk/aws-kinesisfirehose`) that contains a +single L2 construct (`DeliveryStream`). This launch fully and fluently supports +Kinesis Data Firehose (a fully-managed service for delivering real-time +streaming data to storage locations) within the CDK. Out of the box, we are +launching with 3 AWS service destinations (S3, Elasticsearch, and Redshift), as +well as a generic HTTP destination so that customers can connect to the suported +3rd-party cloud service providers or any custom HTTP endpoint that they +develop. These destinations are located in a secondary module +(`@aws-cdk/aws-kinesisfirehose-destinations`). + +### Why should I use this feature? + +Specify and spin up a delivery stream that streams high amounts of data straight +to your storage service. Possible use-cases include automated CloudWatch log +delivery to S3 for analysis in S3; streaming analytic data to Redshift for +analysis in Quicksight. Using Firehose with CDK smooths many configuration edges +and provides seamless integrations with your existing infrastructure as code. + +## Internal FAQ + +### Why are we doing this? + +The [tracking Github issue for the +module](https://github.com/aws/aws-cdk/issues/7536) has the most +1s for a new +module (43) besides Elasticache (68) so we have a clear signal that customers +want CDK support for this service. + +Firehose requires a fairly verbose configuration to set up depending on the +destination desired. For example, the Redshift destination synthesizes to about +900 lines of JSON from about 20 lines of Typescript code. The destination +requires only 5 variables to be configured in order to create a resource with +20+ nested properties and 10+ associated/generated resources. While we retain +flexibility, we often replace several CFN properties with a single boolean +switch that creates and connects the required resources. + +Using Firehose without the CDK requires network configuration, complex +permission statements, and manual intervention. We have added 10+ compile-time +validations and auto-generated permissions to ensure destinations are correctly +integrated, avoiding days of debugging errors. We have leveraged custom +resources in order to perform a one-click deployment that creates an immediately +functional application with no manual effort. + + +### Why should we _not_ do this? + +The Firehose API seems somewhat immature and implementing an L2 on top of the +current L1 may be setting us up for changes in the future. See: “alternative +solutions”, below, for concrete details. + +It’s a large effort (3 devs * 1 week) to invest in a module when we have other +pressing projects. However, the bulk of the effort has been spent already since +we have fairly robust prototypes already implemented. + +### What changes are required to enable this change? + +#### Design + +- `IDeliveryStream` -- interface for created and imported delivery streams + ```ts + interface IDeliveryStream extends + // Since DeliveryStream will extend Resource + cdk.IResource, + // To allow service role to access other resources like Redshift + iam.IGrantable, + // To open network conns between Firehose and resources in VPCs like Redshift + ec2.IConnectable, + // DeliveryStream allows tagging + cdk.ITaggable { + readonly deliveryStreamArn: string; + readonly deliveryStreamName: string; + grant(grantee: iam.IGrantable, ...actions: string[]): iam.Grant; + // Grant identity permission to write data to the stream (PutRecord/Batch) + grantWrite(grantee: iam.IGrantable): iam.Grant; + metric(metricName: string, props?: cloudwatch.MetricOptions): cloudwatch.Metric; + // Some canned metrics as well like `metricBackupToS3DataFreshness` + } + ``` +- `DeliveryStreamProps` -- configuration for creating a `DeliveryStream` + ``` + interface DeliveryStreamProps { + // The destination that this delivery stream will deliver data to. + readonly destination: IDestination; + // Auto-generated by CFN + readonly deliveryStreamName?: string; + // Can source data from Kinesis, if not provided will use API to produce data + readonly sourceStream?: kinesis.IStream; + // Service role + readonly role?: iam.IRole; + // Specifies SSE (AWS-owned, customer-managed, none) + readonly encryption?: StreamEncryption; + // Customer-managed CMK for SSE + readonly encryptionKey?: kms.IKey; + } + ``` +- `IDestination` -- interface that destinations will implement to create + resources as needed and produce configuration that is injected into the + DeliveryStream definition + ``` + // Output of IDestination bind method + interface DestinationConfig { + // Schema-less properties that will be injected directly into `CfnDeliveryStream`. + // Should include top-level key like `{ RedshiftDestinationConfiguration: { ... } }` + readonly properties: object; + } + // Info provided to bind method to help destination attach + interface DestinationBindOptions { + readonly deliveryStream: IDeliveryStream; + } + interface IDestination { + bind(scope: Construct, options: DestinationBindOptions): DestinationConfig; + } + ``` +- `DestinationBase` -- abstract base destination class with some helper + props/methods + ``` + // Compression method for data delivered to S3 + enum Compression { GZIP, HADOOP_SNAPPY, SNAPPY, UNCOMPRESSED, ZIP } + // Not yet fully-fleshed out + interface DataProcessor { + // Function that will be called to do data processing + readonly lambdaFunction: lambda.IFunction; + // Length of time delivery stream will buffer data before sending to processor + readonly bufferInterval?: Duration; + // Size of buffer + readonly bufferSize?: Size; + // Number of retries for networking failures or invocation limits + readonly retries?: number; + } + interface DestinationProps { + // Whether failure logging should be enabled + readonly logging?: boolean; + // Specific log group to use for failure logging + readonly logGroup?: logs.ILogGroup; + // Data transformation to convert data before delivering + // Should probably just be a singleton + readonly processors?: DataProcessor[]; + // Whether to backup all source records, just failed records, or none + readonly backup?: BackupMode; + // Specific bucket to use for backup + readonly backupBucket?: s3.IBucket; + // S3 prefix under which to place backups + readonly backupPrefix?: string; + // Length of time delivery stream will buffer data before backing up + readonly backupBufferInterval?: Duration; + // Size of buffer + readonly backupBufferSize?: Size; + } + abstract class DestinationBase implements IDestination { + constructor(protected readonly props: DestinationProps = {}) {} + abstract bind(scope: Construct, options: DestinationBindOptions): DestinationConfig; + // Helper methods that subclasses can use to create common config + protected createLoggingOptions(...): CfnDeliveryStream.CloudWatchLoggingOptionsProperty | undefined; + protected createProcessingConfig(...): CfnDeliveryStream.ProcessingConfigurationProperty | undefined; + protected createBackupConfig(...): CfnDeliveryStream.S3DestinationConfigurationProperty | undefined; + protected createBufferingHints(...): CfnDeliveryStream.BufferingHintsProperty | undefined; + } + ``` + +#### Other modules + +- Redshift + - expose publiclyAccessible Cluster attribute (used to ensure cluster is + publicly accessible for Firehose access) + - expose subnetGroups Cluster attribute and selectedSubnets ClusterSubnetGroup + (used to ensure cluster is located in public subnet for Firehose access) + - expose attachRole Cluster method that allows lazily attaching a new role to + the cluster after construction (used to give cluster permissions to access + S3 for the COPY operation after the cluster has been created) +- Region Info + - add new fact that tracks the IP addresses used by Firehose in each region + (used to allow incoming connections from Firehose to resources like a + Redshift cluster) + +### Is this a breaking change? + +No. + +### What alternative solutions did you consider? + +- Placing destinations into the core module instead of creating a separate + module, since a delivery stream can’t exist without a destination. + We followed the common pattern of placing service integrations (where one + service provides an interface that multiple other services implement) into a + separate module. In contrast to many of the other modules that follow this + pattern, a delivery stream cannot be created without some destination, as the + destination is a key element of the service. It could be argued that these + destinations should be treated as first-class and co-located with the delivery + stream itself. However, this is similar to SNS, where a topic doesn’t have + much meaning without a subscription and yet service integrations for + subscriptions are still located in a separate module. +- Hoist common configuration/resources such as logging, data transformation, and + backup to the delivery stream level + Currently, we follow the service API closely in the hierarchical sense: many + properties that are common to multiple destinations are specified in the + destination instead of on the delivery stream, since this is how the API + organizes it. Because the delivery stream only allows a single destination, + modelling these common properties on the delivery stream itself would reduce + the amount of configuration each destination implementation would need to + manage. Practically, this would look like moving every property in + `DestinationProps` into `DeliveryStreamProps`, as well as exposing hooks in + `DestinationBindOptions` to allow destinations to call configuration-creating + functions during binding. Some downsides of making this change: moving away + from the service API may confuse customers who have previously used it to + create a delivery stream; if delivery streams support multiple destinations in + the future then configuration will not be flexible per-destination. +- Provide a more generic interface for data transformers instead of requiring a + Lambda function. + The data transformation API seems to indicate future support for processors + that are not Lambda functions ([ProcessingConfiguration.Processor](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-processor.html) + is quite generic, with the only processor type currently supported as + “Lambda”). However, our DataProcessor requires a lambda.IFunction, tying our + API to Lambda as the only supported processor and requiring a breaking change + to support a different processor type. We could work around this by creating a + class instead that has static methods for each possible processor type (ie., + `static fromFunction(lambdaFunction: lambda.IFunction, options: ProcessorOptions): Processor` + ). This may be too complex for a change that we are not confident will occur. +- Allow multiple destinations to be provided to the delivery stream. + While the console UI only allows a single destination to be configured per + delivery stream, the horizontal model of the service API and the fact that a + call to DescribeDeliveryStream returns an array of destinations seems to + indicate that the service team may support multiple destinations in the + future. To that end, we could modify `DeliveryStreamProps` to accept an array of + destinations (instead of a single destination, as is the case currently) and + simply throw an error if multiple destinations are provided until the service + team launches that feature. However, this would be significantly + future-proofing the API at the expense of confusing users that would + reasonably assume that multiple destinations are currently supported. + +### What is the high level implementation plan? + +> Describe your plan on how to deliver this feature from prototyping to GA. +> Especially think about how to "bake" it in the open and get constant feedback +> from users before you stabilize the APIs. +> +> If you have a project board with your implementation plan, this is a good +> place to link to it. + +### Are there any open issues that need to be addressed later? + +We should probably have a separate design for `CustomHttpDestination`, as that +will need to be used for both 3rd-party service partners and truly custom HTTP +endpoints provided by the customer. + +## Appendix + +Feel free to add any number of appendices as you see fit. Appendices are +expected to allow readers to dive deeper to certain sections if they like. For +example, you can include an appendix which describes the detailed design of an +algorithm and reference it from the FAQ. From 6b8ea560af1e0bb959a9f1d8d76efb209801fb26 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 11:24:06 -0500 Subject: [PATCH 02/34] add PR number --- text/0340-firehose-l2.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 277030080..5e609c17b 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -1,6 +1,6 @@ --- -rfc pr: [#xxx](https://github.com/aws/aws-cdk-rfcs/pull/xxx) <-- fill this after you've already created the PR -tracking issue: https://github.com/aws/aws-cdk-rfcs/issues/340 +rfc pr: [#342](https://github.com/aws/aws-cdk-rfcs/pull/342) +tracking issue: [#340](https://github.com/aws/aws-cdk-rfcs/issues/340) --- # Kinesis Data Firehose Delivery Stream L2 From 09e2104252ce21a9a3b49a283c7826306466efb4 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 12:04:29 -0500 Subject: [PATCH 03/34] move sources above destinations, rework first example, linting --- text/0340-firehose-l2.md | 127 +++++++++++++++++++++------------------ 1 file changed, 70 insertions(+), 57 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 5e609c17b..1e423b564 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -33,22 +33,68 @@ Amazon S3, Amazon Redshift, Amazon Elasticsearch, Splunk, or any custom HTTP end third-party services such as Datadog, Dynatrace, LogicMonitor, MongoDB, New Relic, and Sumo Logic. -The simplest specification of a delivery stream is to create an implementation of -`IDestination` and provide it to the constructor of the delivery stream. Supported -destinations are covered [below](#destinations). +## Creating a Delivery Stream + +In order to create a Delivery Stream, you must specify a destination. An S3 bucket can be +used as a destination. More supported destinations are covered [below](#destinations). + ```ts -const destination: firehose.IDestination = (void '...', { - bind(_scope: Construct, _options: firehose.DestinationBindOptions): firehose.DestinationConfig { - return { - properties: {}, - }; - } +import * as destinations from '@aws-cdk/aws-kinesisfirehose-destinations'; + +new DeliveryStream(this, 'Delivery Stream', { + destination: new destinations.S3(), }); -new firehose.DeliveryStream(this, 'Delivery Stream', { +``` + +The above example creates the following resources: +- An S3 bucket +- A Kinesis Data Firehose Delivery Stream with Direct PUT as the source and CloudWatch + error logging turned on. +- An IAM role which gives Kinesis Data Firehose permission to write to your S3 bucket. + +## Sources + +Firehose supports two main methods of sourcing input data: Kinesis Data Streams and via a +"direct put". + +See: [Sending Data to a Delivery Stream](https://docs.aws.amazon.com/firehose/latest/dev/basic-write.html) +in the *Firehose Developer Guide*. + +### Kinesis Data Stream + +Firehose can read directly from a Kinesis data stream as a consumer of the data stream. +Configure this behaviour by providing a data stream in the `sourceStream` property when +constructing a delivery stream: + +```ts fixture=with-destination +import * as kinesis from '@aws-cdk/aws-kinesis'; +const sourceStream = new kinesis.Stream(stack, 'Source Stream'); +new firehose.DeliveryStream(stack, 'Delivery Stream', { + sourceStream: sourceStream, destination: destination, }); ``` +### Direct Put + +If a source data stream is not provided, then Firehose assumes data will be provided via +"direct put", ie., by using a `PutRecord` or `PutRecordBatch` API call. There are a number +of ways of doing so, such as: + +- Kinesis Agent: a standalone Java application that monitors and delivers files while + handling file rotation, checkpointing, and retries. See: [Writing to Firehose Using Kinesis Agent](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html) + in the *Firehose Developer Guide*. +- Firehose SDK: a general purpose solution that allows you to deliver data to Firehose + from anywhere using Java, .NET, Node.js, Python, or Ruby. See: [Writing to Firehose Using the AWS SDK](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-sdk.html) + in the *Firehose Developer Guide*. +- CloudWatch Logs: subscribe to a log group and receive filtered log events directly into + Firehose. See: [logs-destinations](../aws-logs-destinations). +- Eventbridge: add Firehose as an event rule target to send events to a delivery stream + based on the rule filtering. See: [events-targets](../aws-events-targets). +- SNS: add Firehose as a subscription to send all notifications from the topic to a + delivery stream. See: [sns-subscriptions](../aws-sns-subscriptions). +- IoT: add an action to an IoT rule to send various IoT information to a delivery stream + ## Destinations The following destinations are supported. See [@aws-cdk/aws-kinesisfirehose-destinations](../aws-kinesisfirehose-destinations) @@ -71,6 +117,7 @@ new firehose.DeliveryStream(this, 'Delivery Stream', { ``` S3 supports custom dynamic prefixes: + ```ts fixture=with-bucket const s3Destination = new firehosedestinations.S3Destination({ bucket: bucket, @@ -78,6 +125,7 @@ const s3Destination = new firehosedestinations.S3Destination({ errorOutputPrefix: 'myFirehoseFailures/!{firehose:error-output-type}/!{timestamp:yyyy}/anyMonth/!{timestamp:dd}', }); ``` + See: [Custom S3 Prefixes](https://docs.aws.amazon.com/firehose/latest/dev/s3-prefixes.html) in the *Firehose Developer Guide*. S3 supports converting record formats when delivering: @@ -169,48 +217,6 @@ This integration has not been completed (see #1234). [HTTP request/response schema]: https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html -## Sources - -Firehose supports two main methods of sourcing input data: Kinesis Data Streams and via a -"direct put". - -See: [Sending Data to a Delivery Stream](https://docs.aws.amazon.com/firehose/latest/dev/basic-write.html) -in the *Firehose Developer Guide*. - -### Kinesis Data Stream - -Firehose can read directly from a Kinesis data stream as a consumer of the data stream. -Configure this behaviour by providing a data stream in the `sourceStream` property when -constructing a delivery stream: - -```ts fixture=with-destination -import * as kinesis from '@aws-cdk/aws-kinesis'; -const sourceStream = new kinesis.Stream(stack, 'Source Stream'); -new firehose.DeliveryStream(stack, 'Delivery Stream', { - sourceStream: sourceStream, - destination: destination, -}); -``` - -### Direct Put - -If a source data stream is not provided, then Firehose assumes data will be provided via -"direct put", ie., by using a `PutRecord` or `PutRecordBatch` API call. There are a number -of ways of doing so, such as: -- Kinesis Agent: a standalone Java application that monitors and delivers files while - handling file rotation, checkpointing, and retries. See: [Writing to Firehose Using Kinesis Agent](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html) - in the *Firehose Developer Guide*. -- Firehose SDK: a general purpose solution that allows you to deliver data to Firehose - from anywhere using Java, .NET, Node.js, Python, or Ruby. See: [Writing to Firehose Using the AWS SDK](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-sdk.html) - in the *Firehose Developer Guide*. -- CloudWatch Logs: subscribe to a log group and receive filtered log events directly into - Firehose. See: [logs-destinations](../aws-logs-destinations). -- Eventbridge: add Firehose as an event rule target to send events to a delivery stream - based on the rule filtering. See: [events-targets](../aws-events-targets). -- SNS: add Firehose as a subscription to send all notifications from the topic to a - delivery stream. See: [sns-subscriptions](../aws-sns-subscriptions). -- IoT: add an action to an IoT rule to send various IoT information to a delivery stream - ## Server-side Encryption Enabling server-side encryption (SSE) requires Firehose to encrypt all data sent to @@ -262,7 +268,6 @@ stream will cause an error). See: [Data Protection](https://docs.aws.amazon.com/firehose/latest/dev/encryption.html) in the *Firehose Developer Guide*. - ## Monitoring Firehose is integrated with CloudWatch, so you can monitor the performance of your @@ -430,10 +435,12 @@ Firehose supports transforming data before delivering it to destinations. To tra data, Firehose will call a Lambda function that you provide and deliver the data returned in lieu of the source record. The function must return a result that contains records in a specific format, including the following fields: + - `recordId` -- the ID of the input record that corresponds the results. - `result` -- the status of the transformation of the record: "Ok" (success), "Dropped" (not processed intentionally), or "ProcessingFailed" (not processed due to an error). - `data` -- the transformed data, Base64-encoded. + The data is buffered up to 1 minute and up to 3 MiB by default before being sent to the function, but can be configured using `bufferInterval` and `bufferSize` in the processor configuration (see: [Buffering](#buffering)). If the function invocation fails due to a @@ -518,7 +525,6 @@ integrated, avoiding days of debugging errors. We have leveraged custom resources in order to perform a one-click deployment that creates an immediately functional application with no manual effort. - ### Why should we _not_ do this? The Firehose API seems somewhat immature and implementing an L2 on top of the @@ -534,6 +540,7 @@ we have fairly robust prototypes already implemented. #### Design - `IDeliveryStream` -- interface for created and imported delivery streams + ```ts interface IDeliveryStream extends // Since DeliveryStream will extend Resource @@ -553,8 +560,10 @@ we have fairly robust prototypes already implemented. // Some canned metrics as well like `metricBackupToS3DataFreshness` } ``` + - `DeliveryStreamProps` -- configuration for creating a `DeliveryStream` - ``` + + ```ts interface DeliveryStreamProps { // The destination that this delivery stream will deliver data to. readonly destination: IDestination; @@ -570,10 +579,12 @@ we have fairly robust prototypes already implemented. readonly encryptionKey?: kms.IKey; } ``` + - `IDestination` -- interface that destinations will implement to create resources as needed and produce configuration that is injected into the DeliveryStream definition - ``` + + ```ts // Output of IDestination bind method interface DestinationConfig { // Schema-less properties that will be injected directly into `CfnDeliveryStream`. @@ -588,9 +599,11 @@ we have fairly robust prototypes already implemented. bind(scope: Construct, options: DestinationBindOptions): DestinationConfig; } ``` + - `DestinationBase` -- abstract base destination class with some helper props/methods - ``` + + ```ts // Compression method for data delivered to S3 enum Compression { GZIP, HADOOP_SNAPPY, SNAPPY, UNCOMPRESSED, ZIP } // Not yet fully-fleshed out From 6ed049e3a236717d9f735eeae08cd22df4df72e0 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 12:06:35 -0500 Subject: [PATCH 04/34] more linting --- text/0340-firehose-l2.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 1e423b564..ebe257f39 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -47,6 +47,7 @@ new DeliveryStream(this, 'Delivery Stream', { ``` The above example creates the following resources: + - An S3 bucket - A Kinesis Data Firehose Delivery Stream with Direct PUT as the source and CloudWatch error logging turned on. From 80f65ae5f0b4fa37eb027860d0fc06e96431bac6 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 12:45:00 -0500 Subject: [PATCH 05/34] move record format convertion to data processing section, add detail --- text/0340-firehose-l2.md | 66 +++++++++++++++++++++++++++++++++++----- 1 file changed, 58 insertions(+), 8 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index ebe257f39..5827db273 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -129,9 +129,6 @@ const s3Destination = new firehosedestinations.S3Destination({ See: [Custom S3 Prefixes](https://docs.aws.amazon.com/firehose/latest/dev/s3-prefixes.html) in the *Firehose Developer Guide*. -S3 supports converting record formats when delivering: -See: [Converting Input Record Format](https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html) in the *Firehose Developer Guide*. - ### Elasticsearch ```ts @@ -432,10 +429,19 @@ new firehose.DeliveryStream(this, 'Delivery Stream All Explicit Prefix', { ## Data Processing/Transformation -Firehose supports transforming data before delivering it to destinations. To transform the -data, Firehose will call a Lambda function that you provide and deliver the data returned -in lieu of the source record. The function must return a result that contains records in a -specific format, including the following fields: +Firehose supports transforming data before delivering it to destinations. There are two +types of data processing for Delivery Streams: record transformation with AWS Lambda, and +record format conversion using a schema stored in an AWS Glue table. If both types of data +processing are configured, then the Lambda transofmration is perfromed first. By default, +no data processing occurs. + +This feature has not been completed (see #1234). + +### Record transformation with AWS Lambda + +To transform the data, Firehose will call a Lambda function that you provide and deliver +the data returned in lieu of the source record. The function must return a result that +contains records in a specific format, including the following fields: - `recordId` -- the ID of the input record that corresponds the results. - `result` -- the status of the transformation of the record: "Ok" (success), "Dropped" @@ -476,7 +482,51 @@ new firehose.DeliveryStream(stack, 'Delivery Stream', { See: [Data Transformation](https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html) in the *Firehose Developer Guide*. -This feature has not been completed (see #1234). +### Record format conversion using AWS Glue + +Amazon Kinesis Data Firehose can convert the format of your input data from JSON to +[Apache Parquet](https://parquet.apache.org/) or [Apache ORC](https://orc.apache.org/) +before storing the data in Amazon S3. This allows you to change the format of your data +records without writing any Lambda code, but you must use S3 as your destination. + +```ts +import * as glue from '@aws-cdk/aws-glue'; +import * as destinations from '@aws-cdk/aws-kinesisfirehose-destinations'; + +const myGlueDb = new glue.Database(this, 'MyGlueDatabase',{ + databaseName: 'MyGlueDatabase', +}); +const myGlueTable = new glue.Table(this, 'MyGlueTable', { + columns: [{ + name: 'firstname', + type: glue.Schema.STRING, + }, { + name: 'lastname', + type: glue.Schema.STRING, + }, { + name: 'age', + type: glue.Schema.INTEGER, + }], + dataFormat: glue.DataFormat.PARQUET, + database: myGlueDb, + tableName: 'myGlueTable', +}); + + +new DeliveryStream(this, 'Delivery Stream', { + destination: new destinations.S3({ + dataFormatConversionConfiguration: { + schema: myGlueTable, + inputFormat: destinations.InputFormat.OPENX_JSON + // Might be able to nix this property and infer it from the glue table + outputFormat: destinations.OuputFormat.PARQUET + }, + }), +}); +``` + +See: [Converting Input Record Format](https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html) +in the *Firehose Developer Guide*. --- From b682f1041bbd0b93214b46929e4a17fd5247ee18 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 17:59:12 -0500 Subject: [PATCH 06/34] mkusters edits and new grants section --- text/0340-firehose-l2.md | 248 +++++++++++++++++++++++++++------------ 1 file changed, 170 insertions(+), 78 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 5827db273..94a3eabf9 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -26,12 +26,12 @@ destinations. # Amazon Kinesis Data Firehose Construct Library -This module is part of the [AWS Cloud Development Kit](https://github.com/aws/aws-cdk) -project. It allows you to create Kinesis Data Firehose delivery streams. Firehose is a -service for fully-managed delivery of real-time streaming data to storage services such as -Amazon S3, Amazon Redshift, Amazon Elasticsearch, Splunk, or any custom HTTP endpoint or -third-party services such as Datadog, Dynatrace, LogicMonitor, MongoDB, New Relic, and -Sumo Logic. +[Amazon Kinesis Data Firehose](https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html) +is a service for fully-managed delivery of real-time streaming data to storage services +such as Amazon S3, Amazon Redshift, Amazon Elasticsearch, Splunk, or any custom HTTP +endpoint or third-party services such as Datadog, Dynatrace, LogicMonitor, MongoDB, New +Relic, and Sumo Logic. This module is part of the [AWS Cloud Development Kit](https://github.com/aws/aws-cdk) +project. It allows you to create Kinesis Data Firehose delivery streams. ## Creating a Delivery Stream @@ -69,8 +69,9 @@ constructing a delivery stream: ```ts fixture=with-destination import * as kinesis from '@aws-cdk/aws-kinesis'; -const sourceStream = new kinesis.Stream(stack, 'Source Stream'); -new firehose.DeliveryStream(stack, 'Delivery Stream', { + +const sourceStream = new kinesis.Stream(this, 'Source Stream'); +new DeliveryStream(this, 'Delivery Stream', { sourceStream: sourceStream, destination: destination, }); @@ -103,24 +104,29 @@ for the implementations of these destinations. ### S3 +Creating a delivery stream with an S3 bucket destination: + ```ts import * as s3 from '@aws-cdk/aws-s3'; -import * as firehose from '@aws-cdk/aws-kinesisfirehose'; +import * as destinations from '@aws-cdk/aws-kinesisfirehose-destinations'; const bucket = new s3.Bucket(this, 'Bucket'); -const s3Destination = new firehosedestinations.S3Destination({ +const s3Destination = new destinations.S3({ bucket: bucket, }); -new firehose.DeliveryStream(this, 'Delivery Stream', { + +new DeliveryStream(this, 'Delivery Stream', { destination: s3Destination, }); ``` -S3 supports custom dynamic prefixes: +The S3 destination also supports custom dynamic prefixes. `prefix` will be used for +files successfully delivered to Amazon S3. `errorOutputPrefix` will be added to +failed records before writing them to S3. ```ts fixture=with-bucket -const s3Destination = new firehosedestinations.S3Destination({ +const s3Destination = new destinations.S3({ bucket: bucket, prefix: 'myFirehose/DeliveredYear=!{timestamp:yyyy}/anyMonth/rand=!{firehose:random-string}', errorOutputPrefix: 'myFirehoseFailures/!{firehose:error-output-type}/!{timestamp:yyyy}/anyMonth/!{timestamp:dd}', @@ -133,14 +139,14 @@ See: [Custom S3 Prefixes](https://docs.aws.amazon.com/firehose/latest/dev/s3-pre ```ts import * as es from '@aws-cdk/aws-elasticsearch'; -import * as firehose from '@aws-cdk/aws-kinesisfirehose'; +import * as destinations from '@aws-cdk/aws-kinesisfirehose-destinations'; const domain = new es.Domain(this, 'Domain', { version: es.ElasticsearchVersion.V7_1, }); -new firehose.DeliveryStream(this, 'Delivery Stream', { - destination: new ElasticsearchDestination({ +const deliveryStream = new DeliveryStream(this, 'Delivery Stream', { + destination: new destinations.Elasticsearch({ domain: domain, indexName: 'myindex', }), @@ -158,7 +164,7 @@ schema will be created within the provided database. ```ts import * as ec2 from '@aws-cdk/aws-ec2'; -import * as firehose from '@aws-cdk/aws-kinesisfirehose'; +import * as destinations from '@aws-cdk/aws-kinesisfirehose-destinations'; import * as redshift from '@aws-cdk/aws-redshift'; import { Duration, Size } from '@aws-cdk/core'; @@ -176,7 +182,7 @@ const cluster = new redshift.Cluster(this, 'Cluster', { publiclyAccessible: true, }); -const redshiftDestination = new RedshiftDestination({ +const redshiftDestination = new destinations.Redshift({ cluster: cluster, user: { username: 'firehose', @@ -191,7 +197,7 @@ const redshiftDestination = new RedshiftDestination({ ], copyOptions: 'json \'auto\'', }); -new firehose.DeliveryStream(this, 'Delivery Stream', { +new DeliveryStream(this, 'Delivery Stream', { destination: redshiftDestination, }); ``` @@ -224,32 +230,34 @@ layer. Firehose manages keys and cryptographic operations so that sources and de do not need to, as the data is encrypted and decrypted at the boundaries of the Firehose service. By default, delivery streams do not have SSE enabled. -The Key Management Service (KMS) Customer Managed Key (CMK) used for SSE can either be AWS-owned or -customer-managed. AWS-owned CMKs are keys that an AWS service (in this case Firehose) owns -and manages for use in multiple AWS accounts. As a customer, you cannot view, use, track, or manage -these keys, and you are not charged for their use. On the other hand, -customer-managed CMKs are keys that are created and owned within a your account and -managed entirely by the you. As a customer, you are responsible for managing access, rotation, -aliases, and deletion for these keys, and you are changed for their use. See: [Customer master keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys) +The Key Management Service (KMS) Customer Managed Key (CMK) used for SSE can either be +AWS-owned or customer-managed. AWS-owned CMKs are keys that an AWS service (in this case +Firehose) owns and manages for use in multiple AWS accounts. As a customer, you cannot +view, use, track, or manage these keys, and you are not charged for their use. On the +other hand, customer-managed CMKs are keys that are created and owned within your account +and managed entirely by you. As a customer, you are responsible for managing access, +rotation, aliases, and deletion for these keys, and you are changed for their use. See: +[Customer master keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys) in the *KMS Developer Guide*. ```ts fixture=with-destination +import * as kms from '@aws-cdk/aws-kms'; + // SSE with an AWS-owned CMK -new firehose.DeliveryStream(stack, 'Delivery Stream AWS Owned', { - encryption: firehose.StreamEncryption.AWS_OWNED, +new DeliveryStream(this, 'Delivery Stream AWS Owned', { + encryption: StreamEncryption.AWS_OWNED, destination: destination, }); // SSE with an customer-managed CMK that is created automatically by the CDK -new firehose.DeliveryStream(stack, 'Delivery Stream Implicit Customer Managed', { - encryption: firehose.StreamEncryption.CUSTOMER_MANAGED, +new DeliveryStream(this, 'Delivery Stream Implicit Customer Managed', { + encryption: StreamEncryption.CUSTOMER_MANAGED, destination: destination, }); // SSE with an customer-managed CMK that is explicitly specified -import * as kms from '@aws-cdk/aws-kms'; -const key = new kms.Key(stack, 'Key'); -new firehose.DeliveryStream(stack, 'Delivery Stream Explicit Customer Managed', { +const key = new kms.Key(this, 'Key'); +new DeliveryStream(this, 'Delivery Stream Explicit Customer Managed'', { encryptionKey: key, destination: destination, }); @@ -273,23 +281,28 @@ delivery streams via logs and metrics. ### Logs -Firehose will send logs to CloudWatch when data trasformation or data delivery fails. You -can provide a specific log group to specify where the CDK will create the log streams -where log events from Firehose will be sent. Logging for delivery streams is enabled by -default. +Firehose will send logs to CloudWatch when data transformation or data delivery fails. +The CDK will enable logging by default and create a CloudWatch LogGroup and LogStream for +your Delivery Stream. + +You can provide a specific log group to specify where the CDK will create the log streams +where log events from Firehose will be sent: ```ts fixture=with-destination -// Logging enabled with a log group that is automatically created by the CDK -new firehose.DeliveryStream(stack, 'Delivery Stream Implicit Log Group', { - logging: true, +import * as logs from '@aws-cdk/aws-logs'; + +const logGroup = new logs.LogGroup(this, 'Log Group'); +new DeliveryStream(this, 'Delivery Stream', { + logGroup: logGroup, destination: destination, }); +``` -// Logging enabled with a log group that is explicitly specified -import * as logs from '@aws-cdk/aws-logs'; -const logGroup = new logs.LogGroup(stack, 'Log Group'); -new firehose.DeliveryStream(stack, 'Delivery Stream Explicit Log Group', { - logGroup: logGroup, +Logging can also be disabled: + +```ts fixture=with-destination +new DeliveryStream(this, 'Delivery Stream', { + loggingEnabled: false, destination: destination, }); ``` @@ -321,11 +334,11 @@ import * as cloudwatch from '@aws-cdk/aws-cloudwatch'; const incomingBytesPercentOfLimit = new cloudwatch.MathExpression({ expression: 'incomingBytes / 300 / bytePerSecLimit', usingMetrics: { - incomingBytes: deliveryStream.metricIncomingBytes({ statistic: 'sum' }), + incomingBytes: deliveryStream.metricIncomingBytes({ statistic: cloudwatch.Statistic.SUM }), bytePerSecLimit: deliveryStream.metric('BytesPerSecondLimit'), }, }); -new Alarm(stack, 'Alarm', { +new Alarm(this, 'Alarm', { metric: incomingBytesPercentOfLimit, threshold: 0.9, evaluationPeriods: 3, @@ -338,18 +351,18 @@ in the *Firehose Developer Guide*. ## Compression Firehose can automatically compress your data when it is delivered to S3 as either a final -or an intermediary destination. Supported compression formats are: gzip, Snappy, +or an intermediary/backup destination. Supported compression formats are: gzip, Snappy, Hadoop-compatible Snappy, and ZIP, except for Redshift destinations, where Snappy (regardless of Hadoop-compatibility) and ZIP are not supported. By default, data is delivered to S3 without compression. ```ts fixture=with-bucket // Compress data delivered to S3 using Snappy -const s3Destination = new firehosedestinations.S3Destination({ - compression: firehose.Compression.SNAPPY, +const s3Destination = new destinations.S3({ + compression: Compression.SNAPPY, bucket: bucket, }); -new firehose.DeliveryStream(stack, 'Delivery Stream', { +new DeliveryStream(this, 'Delivery Stream', { destination: destination, }); ``` @@ -359,19 +372,20 @@ new firehose.DeliveryStream(stack, 'Delivery Stream', { Firehose buffers incoming data before delivering to the specified destination. Firehose will wait until the amount of incoming data has exceeded some threshold (the "buffer size") or until the time since the last data delivery occurred exceeds some threshold (the -"buffer interval"), whichever happens first. You can configure these threshold based on +"buffer interval"), whichever happens first. You can configure these thresholds based on the capabilities of the destination and your use-case. By default, the buffer size is 3 MiB and the buffer interval is 1 minute. ```ts fixture=with-bucket // Increase the buffer interval and size to 5 minutes and 3 MiB, respectively import * as cdk from '@aws-cdk/core'; -const s3Destination = new firehosedestinations.S3Destination({ + +const s3Destination = new destinations.S3({ bufferingInterval: cdk.Duration.minutes(5), bufferingSize: cdk.Size.mebibytes(8), bucket: bucket, }); -new firehose.DeliveryStream(stack, 'Delivery Stream', { +new DeliveryStream(this, 'Delivery Stream', { destination: destination, }); ``` @@ -380,16 +394,19 @@ new firehose.DeliveryStream(stack, 'Delivery Stream', { Firehose can be configured to backup data to S3 that it attempted to deliver to the configured destination. Backed up data can be all the data that Firehose attempted to -deliver or just data that Firehose failed to deliver (not available for Redshift and S3 -destinations). CDK can create a new S3 bucket where it will back up data or you can -provide a bucket where data will be backed up. You can also provide prefix under which +deliver or just data that Firehose failed to deliver (Redshift and S3 destinations can +only backup all data). CDK can create a new S3 bucket where it will back up data or you +can provide a bucket where data will be backed up. You can also provide prefix under which Firehose will place backed-up data within the bucket. By default, source data is not backed up to S3. ```ts fixture=with-domain +import * as destinations from '@aws-cdk/aws-kinesisfirehose-destinations'; +import * as s3 from '@aws-cdk/aws-s3'; + // Enable backup of all source records (to an S3 bucket created by CDK) -new firehose.DeliveryStream(this, 'Delivery Stream All', { - destination: new ElasticsearchDestination({ +const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All', { + destination: new destinations.Elasticsearch({ domain: domain, indexName: 'myindex', backup: BackupMode.ALL, @@ -397,8 +414,8 @@ new firehose.DeliveryStream(this, 'Delivery Stream All', { }); // Enable backup of only the source records that Firehose failed to deliver (to an S3 bucket created by CDK) -new firehose.DeliveryStream(this, 'Delivery Stream Failed', { - destination: new ElasticsearchDestination({ +const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup Failed', { + destination: new destinations.Elasticsearch({ domain: domain, indexName: 'myindex', backup: BackupMode.FAILED, @@ -406,10 +423,9 @@ new firehose.DeliveryStream(this, 'Delivery Stream Failed', { }); // Explicitly provide an S3 bucket to which all source records will be backed up -import * as s3 from '@aws-cdk/aws-s3'; -const backupBucket = new s3.Bucket(stack, 'Bucket'); -new firehose.DeliveryStream(this, 'Delivery Stream All Explicit Bucket', { - destination: new ElasticsearchDestination({ +const backupBucket = new s3.Bucket(this, 'Bucket'); +const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All Explicit Bucket', { + destination: new destinations.Elasticsearch({ domain: domain, indexName: 'myindex', backupBucket: backupBucket, @@ -417,8 +433,8 @@ new firehose.DeliveryStream(this, 'Delivery Stream All Explicit Bucket', { }); // Explicitly provide an S3 prefix under which all source records will be backed up -new firehose.DeliveryStream(this, 'Delivery Stream All Explicit Prefix', { - destination: new ElasticsearchDestination({ +const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All Explicit Prefix', { + destination: new destinations.Elasticsearch({ domain: domain, indexName: 'myindex', backup: BackupMode.ALL, @@ -427,6 +443,9 @@ new firehose.DeliveryStream(this, 'Delivery Stream All Explicit Prefix', { }); ``` +If any Data Processing or Transformation is configured on your Delivery Stream, the source +records will be backed up in their original format. + ## Data Processing/Transformation Firehose supports transforming data before delivering it to destinations. There are two @@ -435,9 +454,7 @@ record format conversion using a schema stored in an AWS Glue table. If both typ processing are configured, then the Lambda transofmration is perfromed first. By default, no data processing occurs. -This feature has not been completed (see #1234). - -### Record transformation with AWS Lambda +### Data transformation with AWS Lambda To transform the data, Firehose will call a Lambda function that you provide and deliver the data returned in lieu of the source record. The function must return a result that @@ -451,16 +468,15 @@ contains records in a specific format, including the following fields: The data is buffered up to 1 minute and up to 3 MiB by default before being sent to the function, but can be configured using `bufferInterval` and `bufferSize` in the processor configuration (see: [Buffering](#buffering)). If the function invocation fails due to a -network timeout or because of hitting a invocation limit, the invocation is retried 3 -times by default, but can be configured using `retries` in the processor configuration. By -default, no data processing occurs. +network timeout or because of hitting an invocation limit, the invocation is retried 3 +times by default, but can be configured using `retries` in the processor configuration. ```ts fixture=with-bucket // Provide a Lambda function that will transform records before delivery, with custom // buffering and retry configuration import * as cdk from '@aws-cdk/core'; import * as lambda from '@aws-cdk/aws-lambda'; -const lambdaFunction = new lambda.Function(stack, 'Processor', { +const lambdaFunction = new lambda.Function(this, 'Processor', { runtime: lambda.Runtime.NODEJS_12_X, handler: 'index.handler', code: lambda.Code.fromAsset(path.join(__dirname, 'process-records')), @@ -474,7 +490,7 @@ const s3Destination = new firehosedestinations.S3Destination({ }], bucket: bucket, }); -new firehose.DeliveryStream(stack, 'Delivery Stream', { +new DeliveryStream(this, 'Delivery Stream', { destination: destination, }); ``` @@ -512,13 +528,11 @@ const myGlueTable = new glue.Table(this, 'MyGlueTable', { tableName: 'myGlueTable', }); - new DeliveryStream(this, 'Delivery Stream', { destination: new destinations.S3({ dataFormatConversionConfiguration: { schema: myGlueTable, inputFormat: destinations.InputFormat.OPENX_JSON - // Might be able to nix this property and infer it from the glue table outputFormat: destinations.OuputFormat.PARQUET }, }), @@ -528,6 +542,84 @@ new DeliveryStream(this, 'Delivery Stream', { See: [Converting Input Record Format](https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html) in the *Firehose Developer Guide*. +## Permission Grants + +IAM roles, users or groups which need to be able to work with delivery streams should be +granted IAM permissions. + +Any object that implements the `IGrantable` interface (has an associated principal) can be +granted permissions to a delivery stream by calling: + +- `grantRead(principal)` - grants the principal read access +- `grantWrite(principal)` - grants the principal write permissions to a Stream +- `grantReadWrite(principal)` - grants principal read and write permissions + +Grant `read` access to a delivery stream by calling the `grantRead()` method. + +```ts fixture=with-delivery-stream +import * as iam from '@aws-cdk/aws-iam'; +const lambdaRole = new iam.Role(this, 'Role', { + assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'), +} + +// give the role permissions to read information about the delivery stream +deliveryStream.grantRead(lambdaRole); +``` + +The following read permissions are provided to a service principal by the `grantRead()` method: + +- `firehose:DescribeDeliveryStream` +- `firehose:ListDeliveryStreams` +- `firehose:ListTagsForDeliveryStream` + +#### Write Permissions + +Grant `write` permissions to a delivery stream is provided by calling the `grantWrite()` method. + +```ts fixture=with-delivery-stream +import * as iam from '@aws-cdk/aws-iam'; +const lambdaRole = new iam.Role(this, 'Role', { + assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'), + description: 'Example role...', +} + +// give the role permissions to modify the delivery stream +deliveryStream.grantWrite(lambdaRole); +``` + +The following write permissions are provided to a service principal by the `grantWrite()` method: + +- `firehose:DeleteDeliveryStream` +- `firehose:PutRecord` +- `firehose:PutRecordBatch` +- `firehose:StartDeliveryStreamEncryption` +- `firehose:StopDeliveryStreamEncryption` +- `firehose:UpdateDestination` + +#### Custom Permissions + +You can add any set of permissions to a delivery stream by calling the `grant()` method. + +```ts fixture=with-delivery-stream +import * as iam from '@aws-cdk/aws-iam'; +const user = new iam.User(this, 'User'); + +// give user permissions to update destination +deliveryStream.grant(user, 'firehose:UpdateDestination'); +``` + +--- + +# Amazon Kinesis Firehose Destinations Library + +This library provides constructs for adding destinations to a Kinesis Firehose delivery stream. +Destinations can be added by specifying the `destination` prop when creating a delivery stream. + +See Kinesis Firehose module README for usage examples. + +If further customization is required, use `HttpDestination` from this package or implement +`firehose.IDestination`. + --- ## FAQ @@ -736,7 +828,7 @@ No. properties that are common to multiple destinations are specified in the destination instead of on the delivery stream, since this is how the API organizes it. Because the delivery stream only allows a single destination, - modelling these common properties on the delivery stream itself would reduce + modeling these common properties on the delivery stream itself would reduce the amount of configuration each destination implementation would need to manage. Practically, this would look like moving every property in `DestinationProps` into `DeliveryStreamProps`, as well as exposing hooks in From dfd12adfe236829254816a122ea78877334f20ec Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 18:03:50 -0500 Subject: [PATCH 07/34] nicer language for service API --- text/0340-firehose-l2.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 94a3eabf9..d09909d8f 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -670,8 +670,9 @@ functional application with no manual effort. ### Why should we _not_ do this? -The Firehose API seems somewhat immature and implementing an L2 on top of the -current L1 may be setting us up for changes in the future. See: “alternative +We are not confident that the service API is fully set in stone and implementing an L2 on +top of the current L1 may be setting us up for changes in the future. We are reaching out +to the service team to get their input and plans for the service. See: “alternative solutions”, below, for concrete details. It’s a large effort (3 devs * 1 week) to invest in a module when we have other From be0510144e8c43c35b5795caeba58abc96c9ecd2 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 18:05:00 -0500 Subject: [PATCH 08/34] linting --- text/0340-firehose-l2.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index d09909d8f..4f91d21ca 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -554,6 +554,8 @@ granted permissions to a delivery stream by calling: - `grantWrite(principal)` - grants the principal write permissions to a Stream - `grantReadWrite(principal)` - grants principal read and write permissions +### Read Permissions + Grant `read` access to a delivery stream by calling the `grantRead()` method. ```ts fixture=with-delivery-stream @@ -572,7 +574,7 @@ The following read permissions are provided to a service principal by the `grant - `firehose:ListDeliveryStreams` - `firehose:ListTagsForDeliveryStream` -#### Write Permissions +### Write Permissions Grant `write` permissions to a delivery stream is provided by calling the `grantWrite()` method. @@ -596,7 +598,7 @@ The following write permissions are provided to a service principal by the `gran - `firehose:StopDeliveryStreamEncryption` - `firehose:UpdateDestination` -#### Custom Permissions +### Custom Permissions You can add any set of permissions to a delivery stream by calling the `grant()` method. From 8ece11c7c9fde85178f3d54b87b600c018da8f7f Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 18:14:40 -0500 Subject: [PATCH 09/34] separate data and control plane grants --- text/0340-firehose-l2.md | 48 +++++++++++++++++++++++++++++----------- 1 file changed, 35 insertions(+), 13 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 4f91d21ca..6f7f1b0e3 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -550,13 +550,14 @@ granted IAM permissions. Any object that implements the `IGrantable` interface (has an associated principal) can be granted permissions to a delivery stream by calling: -- `grantRead(principal)` - grants the principal read access -- `grantWrite(principal)` - grants the principal write permissions to a Stream -- `grantReadWrite(principal)` - grants principal read and write permissions +- `grantRead(principal)` - grants the principal read access to the control plane +- `grantWrite(principal)` - grants the principal write access to the control plane +- `grantWriteData(principal)` - grants the principal write access to the data plane +- `grantFullAccess(principal)` - grants principal full access to the delivery stream -### Read Permissions +### Control Plane Read Permissions -Grant `read` access to a delivery stream by calling the `grantRead()` method. +Grant `read` access to the control plane of a delivery stream by calling the `grantRead()` method. ```ts fixture=with-delivery-stream import * as iam from '@aws-cdk/aws-iam'; @@ -564,7 +565,7 @@ const lambdaRole = new iam.Role(this, 'Role', { assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'), } -// give the role permissions to read information about the delivery stream +// Give the role permissions to read information about the delivery stream deliveryStream.grantRead(lambdaRole); ``` @@ -574,30 +575,46 @@ The following read permissions are provided to a service principal by the `grant - `firehose:ListDeliveryStreams` - `firehose:ListTagsForDeliveryStream` -### Write Permissions +### Control Plane Write Permissions -Grant `write` permissions to a delivery stream is provided by calling the `grantWrite()` method. +Grant `write` access to the control plane of a delivery stream by calling the `grantWrite()` method. ```ts fixture=with-delivery-stream import * as iam from '@aws-cdk/aws-iam'; const lambdaRole = new iam.Role(this, 'Role', { assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'), - description: 'Example role...', } -// give the role permissions to modify the delivery stream +// Give the role permissions to modify the delivery stream deliveryStream.grantWrite(lambdaRole); ``` The following write permissions are provided to a service principal by the `grantWrite()` method: - `firehose:DeleteDeliveryStream` -- `firehose:PutRecord` -- `firehose:PutRecordBatch` - `firehose:StartDeliveryStreamEncryption` - `firehose:StopDeliveryStreamEncryption` - `firehose:UpdateDestination` +### Data Plane Write Permissions + +Grant `write` access to the data plane of a delivery stream by calling the `grantWriteData()` method. + +```ts fixture=with-delivery-stream +import * as iam from '@aws-cdk/aws-iam'; +const lambdaRole = new iam.Role(this, 'Role', { + assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'), +} + +// Give the role permissions to write data to the delivery stream +deliveryStream.grantWriteData(lambdaRole); +``` + +The following write permissions are provided to a service principal by the `grantWriteData()` method: + +- `firehose:PutRecord` +- `firehose:PutRecordBatch` + ### Custom Permissions You can add any set of permissions to a delivery stream by calling the `grant()` method. @@ -700,8 +717,13 @@ we have fairly robust prototypes already implemented. readonly deliveryStreamArn: string; readonly deliveryStreamName: string; grant(grantee: iam.IGrantable, ...actions: string[]): iam.Grant; - // Grant identity permission to write data to the stream (PutRecord/Batch) + // Grant permission to describe the stream + grantRead(grantee: iam.IGrantable): iam.Grant; + // Grant permission to modify the stream grantWrite(grantee: iam.IGrantable): iam.Grant; + // Grant permission to write data to the stream + grantWriteData(grantee: iam.IGrantable): iam.Grant; + grantFullAccess(grantee: iam.IGrantable): iam.Grant; metric(metricName: string, props?: cloudwatch.MetricOptions): cloudwatch.Metric; // Some canned metrics as well like `metricBackupToS3DataFreshness` } From a2f2b5714c2229578892e4a5aa441d5943b970c4 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 18:35:03 -0500 Subject: [PATCH 10/34] replace Firehose with Kinesis Data Firehose almost everywhere --- text/0340-firehose-l2.md | 281 ++++++++++++++++++++------------------- 1 file changed, 141 insertions(+), 140 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 6f7f1b0e3..09864a184 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -3,7 +3,7 @@ rfc pr: [#342](https://github.com/aws/aws-cdk-rfcs/pull/342) tracking issue: [#340](https://github.com/aws/aws-cdk-rfcs/issues/340) --- -# Kinesis Data Firehose Delivery Stream L2 +# Amazon Kinesis Data Firehose Delivery Stream L2 The `aws-kinesisfirehose` construct library allows you to create Amazon Kinesis Data Firehose delivery streams and destinations with just a few lines of @@ -49,23 +49,23 @@ new DeliveryStream(this, 'Delivery Stream', { The above example creates the following resources: - An S3 bucket -- A Kinesis Data Firehose Delivery Stream with Direct PUT as the source and CloudWatch +- A Kinesis Data Firehose delivery stream with Direct PUT as the source and CloudWatch error logging turned on. -- An IAM role which gives Kinesis Data Firehose permission to write to your S3 bucket. +- An IAM role which gives the delivery stream permission to write to your S3 bucket. ## Sources -Firehose supports two main methods of sourcing input data: Kinesis Data Streams and via a -"direct put". +There are two main methods of sourcing input data: Kinesis Data Streams and via a "direct +put". See: [Sending Data to a Delivery Stream](https://docs.aws.amazon.com/firehose/latest/dev/basic-write.html) -in the *Firehose Developer Guide*. +in the *Kinesis Data Firehose Developer Guide*. ### Kinesis Data Stream -Firehose can read directly from a Kinesis data stream as a consumer of the data stream. -Configure this behaviour by providing a data stream in the `sourceStream` property when -constructing a delivery stream: +A delivery stream can read directly from a Kinesis data stream as a consumer of the data +stream. Configure this behaviour by providing a data stream in the `sourceStream` +property when constructing a delivery stream: ```ts fixture=with-destination import * as kinesis from '@aws-cdk/aws-kinesis'; @@ -79,22 +79,22 @@ new DeliveryStream(this, 'Delivery Stream', { ### Direct Put -If a source data stream is not provided, then Firehose assumes data will be provided via -"direct put", ie., by using a `PutRecord` or `PutRecordBatch` API call. There are a number -of ways of doing so, such as: +If a source data stream is not provided, then data must be provided via "direct put", ie., +by using a `PutRecord` or `PutRecordBatch` API call. There are a number of ways of doing +so, such as: - Kinesis Agent: a standalone Java application that monitors and delivers files while - handling file rotation, checkpointing, and retries. See: [Writing to Firehose Using Kinesis Agent](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html) - in the *Firehose Developer Guide*. -- Firehose SDK: a general purpose solution that allows you to deliver data to Firehose - from anywhere using Java, .NET, Node.js, Python, or Ruby. See: [Writing to Firehose Using the AWS SDK](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-sdk.html) - in the *Firehose Developer Guide*. + handling file rotation, checkpointing, and retries. See: [Writing to Kinesis Data Firehose Using Kinesis Agent](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html) + in the *Kinesis Data Firehose Developer Guide*. +- AWS SDK: a general purpose solution that allows you to deliver data to a delivery stream + from anywhere using Java, .NET, Node.js, Python, or Ruby. See: [Writing to Kinesis Data Firehose Using the AWS SDK](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-sdk.html) + in the *Kinesis Data Firehose Developer Guide*. - CloudWatch Logs: subscribe to a log group and receive filtered log events directly into - Firehose. See: [logs-destinations](../aws-logs-destinations). -- Eventbridge: add Firehose as an event rule target to send events to a delivery stream - based on the rule filtering. See: [events-targets](../aws-events-targets). -- SNS: add Firehose as a subscription to send all notifications from the topic to a - delivery stream. See: [sns-subscriptions](../aws-sns-subscriptions). + a delivery stream. See: [logs-destinations](../aws-logs-destinations). +- Eventbridge: add an event rule target to send events to a delivery stream based on the + rule filtering. See: [events-targets](../aws-events-targets). +- SNS: add a subscription to send all notifications from the topic to a delivery + stream. See: [sns-subscriptions](../aws-sns-subscriptions). - IoT: add an action to an IoT rule to send various IoT information to a delivery stream ## Destinations @@ -121,9 +121,9 @@ new DeliveryStream(this, 'Delivery Stream', { }); ``` -The S3 destination also supports custom dynamic prefixes. `prefix` will be used for -files successfully delivered to Amazon S3. `errorOutputPrefix` will be added to -failed records before writing them to S3. +The S3 destination also supports custom dynamic prefixes. `prefix` will be used for files +successfully delivered to S3. `errorOutputPrefix` will be added to failed records before +writing them to S3. ```ts fixture=with-bucket const s3Destination = new destinations.S3({ @@ -133,7 +133,7 @@ const s3Destination = new destinations.S3({ }); ``` -See: [Custom S3 Prefixes](https://docs.aws.amazon.com/firehose/latest/dev/s3-prefixes.html) in the *Firehose Developer Guide*. +See: [Custom S3 Prefixes](https://docs.aws.amazon.com/firehose/latest/dev/s3-prefixes.html) in the *Kinesis Data Firehose Developer Guide*. ### Elasticsearch @@ -155,12 +155,12 @@ const deliveryStream = new DeliveryStream(this, 'Delivery Stream', { ### Redshift -Firehose can deliver data to a table within a Redshift cluster, using an intermediate S3 -bucket and executing a Redshift `COPY` command. Redshift clusters must be placed in public -subnets within an VPC, must be marked as publicly accessible, and cannot provide a master -user password (it must be generated by the CDK). A Redshift user will be created within -the cluster for the exclusive use of Firehose, and a Redshift table with the provided -schema will be created within the provided database. +A delivery stream can deliver data to a table within a Redshift cluster, using an +intermediate S3 bucket and executing a Redshift `COPY` command. Redshift clusters must be +placed in public subnets within an VPC, must be marked as publicly accessible, and cannot +provide a master user password (it must be generated by the CDK). A Redshift user will be +created within the cluster for the exclusive use of the delivery stream, and a Redshift +table with the provided schema will be created within the provided database. ```ts import * as ec2 from '@aws-cdk/aws-ec2'; @@ -206,16 +206,16 @@ new DeliveryStream(this, 'Delivery Stream', { Third-party service providers such as Splunk, Datadog, Dynatrace, LogicMonitor, MongoDB, New Relic, and Sumo Logic have integrated with AWS to allow users to configure their -service as a Firehose destination out of the box. +service as a delivery stream destination out of the box. These integrations have not been completed (see #1234), please use [custom HTTP endpoints](#custom-http-endpoint) to integrate with 3rd party services. ### Custom HTTP Endpoint -Firehose supports writing data to any custom HTTP endpoint that conforms to the -[HTTP request/response schema]. Use the `HttpDestination` class to specify how Firehose -can reach your custom endpoint and any configuration that may be required. +A delivery stream can deliver data to any custom HTTP endpoint that conforms to the +[HTTP request/response schema]. Use the `HttpDestination` class to specify how Kinesis +Data Firehose can reach your custom endpoint and any configuration that may be required. This integration has not been completed (see #1234). @@ -223,21 +223,21 @@ This integration has not been completed (see #1234). ## Server-side Encryption -Enabling server-side encryption (SSE) requires Firehose to encrypt all data sent to -delivery stream when it is stored at rest. This means that data is encrypted before being -written to the storage layer and decrypted after it is received from the storage -layer. Firehose manages keys and cryptographic operations so that sources and destinations -do not need to, as the data is encrypted and decrypted at the boundaries of the Firehose -service. By default, delivery streams do not have SSE enabled. +Enabling server-side encryption (SSE) requires Kinesis Data Firehose to encrypt all data +sent to delivery stream when it is stored at rest. This means that data is encrypted +before being written to the storage layer and decrypted after it is received from the +storage layer. The service manages keys and cryptographic operations so that sources and +destinations do not need to, as the data is encrypted and decrypted at the boundaries of +the service. By default, delivery streams do not have SSE enabled. The Key Management Service (KMS) Customer Managed Key (CMK) used for SSE can either be AWS-owned or customer-managed. AWS-owned CMKs are keys that an AWS service (in this case -Firehose) owns and manages for use in multiple AWS accounts. As a customer, you cannot -view, use, track, or manage these keys, and you are not charged for their use. On the -other hand, customer-managed CMKs are keys that are created and owned within your account -and managed entirely by you. As a customer, you are responsible for managing access, -rotation, aliases, and deletion for these keys, and you are changed for their use. See: -[Customer master keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys) +Kinesis Data Firehose) owns and manages for use in multiple AWS accounts. As a customer, +you cannot view, use, track, or manage these keys, and you are not charged for their +use. On the other hand, customer-managed CMKs are keys that are created and owned within +your account and managed entirely by you. As a customer, you are responsible for managing +access, rotation, aliases, and deletion for these keys, and you are changed for their +use. See: [Customer master keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys) in the *KMS Developer Guide*. ```ts fixture=with-destination @@ -263,30 +263,30 @@ new DeliveryStream(this, 'Delivery Stream Explicit Customer Managed'', { }); ``` -If a Kinesis data stream is configured as the source of a Firehose delivery stream, +If a Kinesis data stream is configured as the source of a delivery stream, Kinesis Data Firehose no longer stores data at rest and all encryption is handled by Kinesis Data -Streams. Firehose receives unencrypted data from Kinesis Data Streams, buffers the data in -memory, and sends the data to destinations without ever writing the unencrypted data at -rest. Practically, this means that SSE should be specified on the Kinesis data stream when -it is used as the source of a Firehose delivery stream (and specifying SSE on the delivery -stream will cause an error). +Streams. Kinesis Data Firehose receives unencrypted data from Kinesis Data Streams, +buffers the data in memory, and sends the data to destinations without ever writing the +unencrypted data at rest. Practically, this means that SSE should be specified on the +Kinesis data stream when it is used as the source of a delivery stream (and specifying SSE +on the delivery stream will cause an error). See: [Data Protection](https://docs.aws.amazon.com/firehose/latest/dev/encryption.html) in -the *Firehose Developer Guide*. +the *Kinesis Data Firehose Developer Guide*. ## Monitoring -Firehose is integrated with CloudWatch, so you can monitor the performance of your -delivery streams via logs and metrics. +Kinesis Data Firehose is integrated with CloudWatch, so you can monitor the performance of +your delivery streams via logs and metrics. ### Logs -Firehose will send logs to CloudWatch when data transformation or data delivery fails. -The CDK will enable logging by default and create a CloudWatch LogGroup and LogStream for -your Delivery Stream. +Kinesis Data Firehose will send logs to CloudWatch when data transformation or data +delivery fails. The CDK will enable logging by default and create a CloudWatch LogGroup +and LogStream for your Delivery Stream. You can provide a specific log group to specify where the CDK will create the log streams -where log events from Firehose will be sent: +where log events will be sent: ```ts fixture=with-destination import * as logs from '@aws-cdk/aws-logs'; @@ -308,24 +308,24 @@ new DeliveryStream(this, 'Delivery Stream', { ``` See: [Monitoring using CloudWatch Logs](https://docs.aws.amazon.com/firehose/latest/dev/monitoring-with-cloudwatch-logs.html) -in the *Firehose Developer Guide*. +in the *Kinesis Data Firehose Developer Guide*. ### Metrics -Firehose sends metrics to CloudWatch so that you can collect and analyze the performance -of the delivery stream, including data delivery, data ingestion, data transformation, -format conversion, API usage, encryption, and resource usage. You can then use CloudWatch -alarms to alert you, for example, when data freshness (the age of the oldest record in -Firehose) exceeds the buffering limit (indicating that data is not being delivered to your -destination), or when the rate of incoming records exceeds the limit of records per second -(indicating data is flowing into your delivery stream faster than Firehose is configured -to process it). - -CDK provides methods for accessing Firehose metrics with default configuration, such as -`metricIncomingBytes`, and `metricIncomingRecords` (see [`IDeliveryStream`](../lib/delivery-stream.ts) +Kinesis Data Firehose sends metrics to CloudWatch so that you can collect and analyze the +performance of the delivery stream, including data delivery, data ingestion, data +transformation, format conversion, API usage, encryption, and resource usage. You can then +use CloudWatch alarms to alert you, for example, when data freshness (the age of the +oldest record in the delivery stream) exceeds the buffering limit (indicating that data is +not being delivered to your destination), or when the rate of incoming records exceeds the +limit of records per second (indicating data is flowing into your delivery stream faster +than it is configured to process). + +CDK provides methods for accessing delivery stream metrics with default configuration, +such as `metricIncomingBytes`, and `metricIncomingRecords` (see [`IDeliveryStream`](../lib/delivery-stream.ts) for a full list). CDK also provides a generic `metric` method that can be used to produce -metric configurations for any metric provided by Firehose; the configurations are -pre-populated with the correct dimensions for the delivery stream. +metric configurations for any metric provided by Kinesis Data Firehose; the configurations +are pre-populated with the correct dimensions for the delivery stream. ```ts fixture=with-delivery-stream // TODO: confirm this is a valid alarm @@ -346,12 +346,12 @@ new Alarm(this, 'Alarm', { ``` See: [Monitoring Using CloudWatch Metrics](https://docs.aws.amazon.com/firehose/latest/dev/monitoring-with-cloudwatch-metrics.html) -in the *Firehose Developer Guide*. +in the *Kinesis Data Firehose Developer Guide*. ## Compression -Firehose can automatically compress your data when it is delivered to S3 as either a final -or an intermediary/backup destination. Supported compression formats are: gzip, Snappy, +Your data can automatically be compressed when it is delivered to S3 as either a final or +an intermediary/backup destination. Supported compression formats are: gzip, Snappy, Hadoop-compatible Snappy, and ZIP, except for Redshift destinations, where Snappy (regardless of Hadoop-compatibility) and ZIP are not supported. By default, data is delivered to S3 without compression. @@ -369,12 +369,12 @@ new DeliveryStream(this, 'Delivery Stream', { ## Buffering -Firehose buffers incoming data before delivering to the specified destination. Firehose -will wait until the amount of incoming data has exceeded some threshold (the "buffer -size") or until the time since the last data delivery occurred exceeds some threshold (the -"buffer interval"), whichever happens first. You can configure these thresholds based on -the capabilities of the destination and your use-case. By default, the buffer size is 3 -MiB and the buffer interval is 1 minute. +Incoming data is buffered before it is delivered to the specified destination. The +delivery stream will wait until the amount of incoming data has exceeded some threshold +(the "buffer size") or until the time since the last data delivery occurred exceeds some +threshold (the "buffer interval"), whichever happens first. You can configure these +thresholds based on the capabilities of the destination and your use-case. By default, the +buffer size is 3 MiB and the buffer interval is 1 minute. ```ts fixture=with-bucket // Increase the buffer interval and size to 5 minutes and 3 MiB, respectively @@ -390,14 +390,17 @@ new DeliveryStream(this, 'Delivery Stream', { }); ``` +See: [Data Delivery Frequency](https://docs.aws.amazon.com/firehose/latest/dev/basic-deliver.html#frequency) +in the *Kinesis Data Firehose Developer Guide*. + ## Backup -Firehose can be configured to backup data to S3 that it attempted to deliver to the -configured destination. Backed up data can be all the data that Firehose attempted to -deliver or just data that Firehose failed to deliver (Redshift and S3 destinations can -only backup all data). CDK can create a new S3 bucket where it will back up data or you -can provide a bucket where data will be backed up. You can also provide prefix under which -Firehose will place backed-up data within the bucket. By default, source data is not +A delivery stream can be configured to backup data to S3 that it attempted to deliver to +the configured destination. Backed up data can be all the data that the delivery stream +attempted to deliver or just data that it failed to deliver (Redshift and S3 destinations +can only backup all data). CDK can create a new S3 bucket where it will back up data or +you can provide a bucket where data will be backed up. You can also provide prefix under +which your backed-up data will placed within the bucket. By default, source data is not backed up to S3. ```ts fixture=with-domain @@ -413,7 +416,7 @@ const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All', { }), }); -// Enable backup of only the source records that Firehose failed to deliver (to an S3 bucket created by CDK) +// Enable backup of only the source records that failed to deliver (to an S3 bucket created by CDK) const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup Failed', { destination: new destinations.Elasticsearch({ domain: domain, @@ -448,17 +451,17 @@ records will be backed up in their original format. ## Data Processing/Transformation -Firehose supports transforming data before delivering it to destinations. There are two -types of data processing for Delivery Streams: record transformation with AWS Lambda, and -record format conversion using a schema stored in an AWS Glue table. If both types of data +Data can be transformed before being delivered to destinations. There are two types of +data processing for delivery streams: record transformation with AWS Lambda, and record +format conversion using a schema stored in an AWS Glue table. If both types of data processing are configured, then the Lambda transofmration is perfromed first. By default, no data processing occurs. ### Data transformation with AWS Lambda -To transform the data, Firehose will call a Lambda function that you provide and deliver -the data returned in lieu of the source record. The function must return a result that -contains records in a specific format, including the following fields: +To transform the data, Kinesis Data Firehose will call a Lambda function that you provide +and deliver the data returned in lieu of the source record. The function must return a +result that contains records in a specific format, including the following fields: - `recordId` -- the ID of the input record that corresponds the results. - `result` -- the status of the transformation of the record: "Ok" (success), "Dropped" @@ -496,13 +499,13 @@ new DeliveryStream(this, 'Delivery Stream', { ``` See: [Data Transformation](https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html) -in the *Firehose Developer Guide*. +in the *Kinesis Data Firehose Developer Guide*. ### Record format conversion using AWS Glue -Amazon Kinesis Data Firehose can convert the format of your input data from JSON to +Kinesis Data Firehose can convert the format of your input data from JSON to [Apache Parquet](https://parquet.apache.org/) or [Apache ORC](https://orc.apache.org/) -before storing the data in Amazon S3. This allows you to change the format of your data +before storing the data in S3. This allows you to change the format of your data records without writing any Lambda code, but you must use S3 as your destination. ```ts @@ -540,7 +543,7 @@ new DeliveryStream(this, 'Delivery Stream', { ``` See: [Converting Input Record Format](https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html) -in the *Firehose Developer Guide*. +in the *Kinesis Data Firehose Developer Guide*. ## Permission Grants @@ -629,12 +632,13 @@ deliveryStream.grant(user, 'firehose:UpdateDestination'); --- -# Amazon Kinesis Firehose Destinations Library +# Amazon Kinesis Data Firehose Destinations Library -This library provides constructs for adding destinations to a Kinesis Firehose delivery stream. -Destinations can be added by specifying the `destination` prop when creating a delivery stream. +This library provides constructs for adding destinations to a Amazon Kinesis Data Firehose +delivery stream. Destinations can be added by specifying the `destination` prop when +creating a delivery stream. -See Kinesis Firehose module README for usage examples. +See Amazon Kinesis Data Firehose module README for usage examples. If further customization is required, use `HttpDestination` from this package or implement `firehose.IDestination`. @@ -645,47 +649,44 @@ If further customization is required, use `HttpDestination` from this package or ### What are we launching today? -We are launching a new module (`@aws-cdk/aws-kinesisfirehose`) that contains a -single L2 construct (`DeliveryStream`). This launch fully and fluently supports -Kinesis Data Firehose (a fully-managed service for delivering real-time -streaming data to storage locations) within the CDK. Out of the box, we are -launching with 3 AWS service destinations (S3, Elasticsearch, and Redshift), as -well as a generic HTTP destination so that customers can connect to the suported -3rd-party cloud service providers or any custom HTTP endpoint that they -develop. These destinations are located in a secondary module +We are launching a new module (`@aws-cdk/aws-kinesisfirehose`) that contains a single L2 +construct (`DeliveryStream`). This launch fully and fluently supports Kinesis Data +Firehose (a fully-managed service for delivering real-time streaming data to storage +locations) within the CDK. Out of the box, we are launching with 3 AWS service +destinations (S3, Elasticsearch, and Redshift), as well as a generic HTTP destination so +that customers can connect to the suported 3rd-party cloud service providers or any custom +HTTP endpoint that they develop. These destinations are located in a secondary module (`@aws-cdk/aws-kinesisfirehose-destinations`). ### Why should I use this feature? -Specify and spin up a delivery stream that streams high amounts of data straight -to your storage service. Possible use-cases include automated CloudWatch log -delivery to S3 for analysis in S3; streaming analytic data to Redshift for -analysis in Quicksight. Using Firehose with CDK smooths many configuration edges -and provides seamless integrations with your existing infrastructure as code. +Specify and spin up a delivery stream that streams high amounts of data straight to your +storage service. Possible use-cases include automated CloudWatch log delivery to S3 for +analysis in S3; streaming analytic data to Redshift for analysis in Quicksight. Using +Kinesis Data Firehose with CDK smooths many configuration edges and provides seamless +integrations with your existing infrastructure as code. ## Internal FAQ ### Why are we doing this? -The [tracking Github issue for the -module](https://github.com/aws/aws-cdk/issues/7536) has the most +1s for a new -module (43) besides Elasticache (68) so we have a clear signal that customers -want CDK support for this service. - -Firehose requires a fairly verbose configuration to set up depending on the -destination desired. For example, the Redshift destination synthesizes to about -900 lines of JSON from about 20 lines of Typescript code. The destination -requires only 5 variables to be configured in order to create a resource with -20+ nested properties and 10+ associated/generated resources. While we retain -flexibility, we often replace several CFN properties with a single boolean -switch that creates and connects the required resources. - -Using Firehose without the CDK requires network configuration, complex -permission statements, and manual intervention. We have added 10+ compile-time -validations and auto-generated permissions to ensure destinations are correctly -integrated, avoiding days of debugging errors. We have leveraged custom -resources in order to perform a one-click deployment that creates an immediately -functional application with no manual effort. +The [tracking Github issue for the module](https://github.com/aws/aws-cdk/issues/7536) has +the most +1s for a new module (43) besides Elasticache (68) so we have a clear signal that +customers want CDK support for this service. + +A delivery stream requires a fairly verbose configuration to set up depending on the +destination desired. For example, the Redshift destination synthesizes to about 900 lines +of JSON from about 20 lines of Typescript code. The destination requires only 5 variables +to be configured in order to create a resource with 20+ nested properties and 10+ +associated/generated resources. While we retain flexibility, we often replace several CFN +properties with a single boolean switch that creates and connects the required resources. + +Using Kinesis Data Firehose without the CDK requires network configuration, complex +permission statements, and manual intervention. We have added 10+ compile-time validations +and auto-generated permissions to ensure destinations are correctly integrated, avoiding +days of debugging errors. We have leveraged custom resources in order to perform a +one-click deployment that creates an immediately functional application with no manual +effort. ### Why should we _not_ do this? @@ -694,9 +695,9 @@ top of the current L1 may be setting us up for changes in the future. We are rea to the service team to get their input and plans for the service. See: “alternative solutions”, below, for concrete details. -It’s a large effort (3 devs * 1 week) to invest in a module when we have other -pressing projects. However, the bulk of the effort has been spent already since -we have fairly robust prototypes already implemented. +It’s a large effort (3 devs * 1 week) to invest in a module when we have other pressing +projects. However, the bulk of the effort has been spent already since we have fairly +robust prototypes already implemented. ### What changes are required to enable this change? From 3d423249d714eadb08d2912762bfdd2944b16b40 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 18:35:47 -0500 Subject: [PATCH 11/34] typo --- text/0340-firehose-l2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 09864a184..892aabdc1 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -454,7 +454,7 @@ records will be backed up in their original format. Data can be transformed before being delivered to destinations. There are two types of data processing for delivery streams: record transformation with AWS Lambda, and record format conversion using a schema stored in an AWS Glue table. If both types of data -processing are configured, then the Lambda transofmration is perfromed first. By default, +processing are configured, then the Lambda transformation is performed first. By default, no data processing occurs. ### Data transformation with AWS Lambda From 81a33ce7ac3bbc138d8f5de7b44002ad6e021ef0 Mon Sep 17 00:00:00 2001 From: Madeline Kusters Date: Fri, 25 Jun 2021 17:18:00 -0700 Subject: [PATCH 12/34] Add IAM role section --- text/0340-firehose-l2.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 892aabdc1..eb221dafa 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -545,6 +545,27 @@ new DeliveryStream(this, 'Delivery Stream', { See: [Converting Input Record Format](https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html) in the *Kinesis Data Firehose Developer Guide*. +### Specifying an IAM role + +The DeliveryStream class automatically creates an IAM role with all the minimum necessary permissions for Kinesis Data Firehose to access the resources referenced by your delivery stream. For example: an Elasticsearch domain, a Redshift cluster, a backup or destination S3 bucket, a Lambda data transformer, an AWS Glue table schema, etc. If you wish, you may specify your own IAM role. It must have the correct permissions, or delivery stream creation or data delivery may fail. + +```ts +import * as iam from '@aws-cdk/aws-iam'; +import * as s3 from '@aws-cdk/aws-iam'; + +const role = new iam.Role(stack, 'MyRole'); +const bucket = new s3.bucket(stack, 'MyBucket'); +bucket.grantWrite(role); +new DeliveryStream(stack, 'MyDeliveryStream', { + destination: new destinations.S3({ + bucket: bucket, + }), + role: role, +}); +``` + +See [Controlling Access](https://docs.aws.amazon.com/firehose/latest/dev/controlling-access.html) in the *Kinesis Data Firehose Developer Guide*. + ## Permission Grants IAM roles, users or groups which need to be able to work with delivery streams should be From 03b9ba3b16c2c81663413d88bab7223c4e77a30c Mon Sep 17 00:00:00 2001 From: Madeline Kusters Date: Fri, 25 Jun 2021 17:21:00 -0700 Subject: [PATCH 13/34] add alternative solution about multiple IAM roles --- text/0340-firehose-l2.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index eb221dafa..dccf26d49 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -906,6 +906,14 @@ No. team launches that feature. However, this would be significantly future-proofing the API at the expense of confusing users that would reasonably assume that multiple destinations are currently supported. +- Allowing the user to create or use separate IAM roles for each aspect of the + delivery stream. This would mean that in a complex delivery stream using AWS + Lambda transformation, AWS Glue record conversion, S3 Backup, and a destination, + that the user could specify a separate IAM role for Firehose to access each of + those resources. We chose to only use one role for this design, because it will + be simpler for the user. While the API is in the experimental stage, we will + solicit customer feedback to find out if customers want to have more fine grained + control over the permissions for their delivery stream. ### What is the high level implementation plan? From e8defd56c28311eaa1cc417841c1d30d3e51221c Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 25 Jun 2021 19:25:36 -0500 Subject: [PATCH 14/34] line length linting --- text/0340-firehose-l2.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index dccf26d49..0f806817d 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -547,7 +547,12 @@ in the *Kinesis Data Firehose Developer Guide*. ### Specifying an IAM role -The DeliveryStream class automatically creates an IAM role with all the minimum necessary permissions for Kinesis Data Firehose to access the resources referenced by your delivery stream. For example: an Elasticsearch domain, a Redshift cluster, a backup or destination S3 bucket, a Lambda data transformer, an AWS Glue table schema, etc. If you wish, you may specify your own IAM role. It must have the correct permissions, or delivery stream creation or data delivery may fail. +The DeliveryStream class automatically creates an IAM role with all the minimum necessary +permissions for Kinesis Data Firehose to access the resources referenced by your delivery +stream. For example: an Elasticsearch domain, a Redshift cluster, a backup or destination +S3 bucket, a Lambda data transformer, an AWS Glue table schema, etc. If you wish, you may +specify your own IAM role. It must have the correct permissions, or delivery stream +creation or data delivery may fail. ```ts import * as iam from '@aws-cdk/aws-iam'; @@ -564,7 +569,8 @@ new DeliveryStream(stack, 'MyDeliveryStream', { }); ``` -See [Controlling Access](https://docs.aws.amazon.com/firehose/latest/dev/controlling-access.html) in the *Kinesis Data Firehose Developer Guide*. +See [Controlling Access](https://docs.aws.amazon.com/firehose/latest/dev/controlling-access.html) +in the *Kinesis Data Firehose Developer Guide*. ## Permission Grants From 80be0c952c45062e8bce8283e492ecd3663f4967 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Mon, 28 Jun 2021 10:32:06 -0500 Subject: [PATCH 15/34] reformat alternative solutions to add numbers for easier references --- text/0340-firehose-l2.md | 107 +++++++++++++++++++-------------------- 1 file changed, 51 insertions(+), 56 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 0f806817d..431d2412e 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -864,62 +864,57 @@ No. ### What alternative solutions did you consider? -- Placing destinations into the core module instead of creating a separate - module, since a delivery stream can’t exist without a destination. - We followed the common pattern of placing service integrations (where one - service provides an interface that multiple other services implement) into a - separate module. In contrast to many of the other modules that follow this - pattern, a delivery stream cannot be created without some destination, as the - destination is a key element of the service. It could be argued that these - destinations should be treated as first-class and co-located with the delivery - stream itself. However, this is similar to SNS, where a topic doesn’t have - much meaning without a subscription and yet service integrations for - subscriptions are still located in a separate module. -- Hoist common configuration/resources such as logging, data transformation, and - backup to the delivery stream level - Currently, we follow the service API closely in the hierarchical sense: many - properties that are common to multiple destinations are specified in the - destination instead of on the delivery stream, since this is how the API - organizes it. Because the delivery stream only allows a single destination, - modeling these common properties on the delivery stream itself would reduce - the amount of configuration each destination implementation would need to - manage. Practically, this would look like moving every property in - `DestinationProps` into `DeliveryStreamProps`, as well as exposing hooks in - `DestinationBindOptions` to allow destinations to call configuration-creating - functions during binding. Some downsides of making this change: moving away - from the service API may confuse customers who have previously used it to - create a delivery stream; if delivery streams support multiple destinations in - the future then configuration will not be flexible per-destination. -- Provide a more generic interface for data transformers instead of requiring a - Lambda function. - The data transformation API seems to indicate future support for processors - that are not Lambda functions ([ProcessingConfiguration.Processor](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-processor.html) - is quite generic, with the only processor type currently supported as - “Lambda”). However, our DataProcessor requires a lambda.IFunction, tying our - API to Lambda as the only supported processor and requiring a breaking change - to support a different processor type. We could work around this by creating a - class instead that has static methods for each possible processor type (ie., - `static fromFunction(lambdaFunction: lambda.IFunction, options: ProcessorOptions): Processor` - ). This may be too complex for a change that we are not confident will occur. -- Allow multiple destinations to be provided to the delivery stream. - While the console UI only allows a single destination to be configured per - delivery stream, the horizontal model of the service API and the fact that a - call to DescribeDeliveryStream returns an array of destinations seems to - indicate that the service team may support multiple destinations in the - future. To that end, we could modify `DeliveryStreamProps` to accept an array of - destinations (instead of a single destination, as is the case currently) and - simply throw an error if multiple destinations are provided until the service - team launches that feature. However, this would be significantly - future-proofing the API at the expense of confusing users that would - reasonably assume that multiple destinations are currently supported. -- Allowing the user to create or use separate IAM roles for each aspect of the - delivery stream. This would mean that in a complex delivery stream using AWS - Lambda transformation, AWS Glue record conversion, S3 Backup, and a destination, - that the user could specify a separate IAM role for Firehose to access each of - those resources. We chose to only use one role for this design, because it will - be simpler for the user. While the API is in the experimental stage, we will - solicit customer feedback to find out if customers want to have more fine grained - control over the permissions for their delivery stream. +1. Placing destinations into the core module instead of creating a separate module, since + a delivery stream can’t exist without a destination. We followed the common pattern of + placing service integrations (where one service provides an interface that multiple + other services implement) into a separate module. In contrast to many of the other + modules that follow this pattern, a delivery stream cannot be created without some + destination, as the destination is a key element of the service. It could be argued + that these destinations should be treated as first-class and co-located with the + delivery stream itself. However, this is similar to SNS, where a topic doesn’t have + much meaning without a subscription and yet service integrations for subscriptions are + still located in a separate module. +2. Hoist common configuration/resources such as logging, data transformation, and backup + to the delivery stream level Currently, we follow the service API closely in the + hierarchical sense: many properties that are common to multiple destinations are + specified in the destination instead of on the delivery stream, since this is how the + API organizes it. Because the delivery stream only allows a single destination, + modeling these common properties on the delivery stream itself would reduce the amount + of configuration each destination implementation would need to manage. Practically, + this would look like moving every property in `DestinationProps` into + `DeliveryStreamProps`, as well as exposing hooks in `DestinationBindOptions` to allow + destinations to call configuration-creating functions during binding. Some downsides + of making this change: moving away from the service API may confuse customers who have + previously used it to create a delivery stream; if delivery streams support multiple + destinations in the future then configuration will not be flexible per-destination. +3. Provide a more generic interface for data transformers instead of requiring a Lambda + function. The data transformation API seems to indicate future support for processors + that are not Lambda functions; [ProcessingConfiguration.Processor](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-processor.html) + is quite generic, with the only processor type currently supported as “Lambda”. + However, our DataProcessor requires a lambda.IFunction, tying our API to Lambda as the + only supported processor and requiring a breaking change to support a different + processor type. We could work around this by creating a class instead that has static + methods for each possible processor type (ie., `static fromFunction(lambdaFunction: + lambda.IFunction, options: ProcessorOptions): Processor`). This may be too complex for + a change that we are not confident will occur. +4. Allow multiple destinations to be provided to the delivery stream. While the console + UI only allows a single destination to be configured per delivery stream, the + horizontal model of the service API and the fact that a call to DescribeDeliveryStream + returns an array of destinations seems to indicate that the service team may support + multiple destinations in the future. To that end, we could modify `DeliveryStreamProps` + to accept an array of destinations (instead of a single destination, as is the case + currently) and simply throw an error if multiple destinations are provided until the + service team launches that feature. However, this would be significantly + future-proofing the API at the expense of confusing users that would reasonably assume + that multiple destinations are currently supported. +5. Allowing the user to create or use separate IAM roles for each aspect of the delivery + stream. This would mean that in a complex delivery stream using AWS Lambda + transformation, AWS Glue record conversion, S3 Backup, and a destination, that the user + could specify a separate IAM role for Firehose to access each of those resources. We + chose to only use one role for this design, because it will be simpler for the + user. While the API is in the experimental stage, we will solicit customer feedback to + find out if customers want to have more fine grained control over the permissions for + their delivery stream. ### What is the high level implementation plan? From a0578325faa0db58248c2408c951cd7992fe5419 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Mon, 28 Jun 2021 10:32:30 -0500 Subject: [PATCH 16/34] fix provided IAM role exapmle compilation --- text/0340-firehose-l2.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 431d2412e..a293e9811 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -558,10 +558,12 @@ creation or data delivery may fail. import * as iam from '@aws-cdk/aws-iam'; import * as s3 from '@aws-cdk/aws-iam'; -const role = new iam.Role(stack, 'MyRole'); -const bucket = new s3.bucket(stack, 'MyBucket'); +const role = new iam.Role(this, 'Role', { + assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'), +} +const bucket = new s3.Bucket(stack, 'Bucket'); bucket.grantWrite(role); -new DeliveryStream(stack, 'MyDeliveryStream', { +new DeliveryStream(stack, 'Delivery Stream', { destination: new destinations.S3({ bucket: bucket, }), From d32e18257200f4b3ae4a454d23acbecd014eb62e Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Mon, 28 Jun 2021 10:34:36 -0500 Subject: [PATCH 17/34] make module links absolute --- text/0340-firehose-l2.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index a293e9811..ec0770c36 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -90,16 +90,16 @@ so, such as: from anywhere using Java, .NET, Node.js, Python, or Ruby. See: [Writing to Kinesis Data Firehose Using the AWS SDK](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-sdk.html) in the *Kinesis Data Firehose Developer Guide*. - CloudWatch Logs: subscribe to a log group and receive filtered log events directly into - a delivery stream. See: [logs-destinations](../aws-logs-destinations). + a delivery stream. See: [logs-destinations](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-logs-destinations-readme.html). - Eventbridge: add an event rule target to send events to a delivery stream based on the - rule filtering. See: [events-targets](../aws-events-targets). + rule filtering. See: [events-targets](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-events-targets-readme.html). - SNS: add a subscription to send all notifications from the topic to a delivery - stream. See: [sns-subscriptions](../aws-sns-subscriptions). + stream. See: [sns-subscriptions](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-sns-subscriptions-readme.html). - IoT: add an action to an IoT rule to send various IoT information to a delivery stream ## Destinations -The following destinations are supported. See [@aws-cdk/aws-kinesisfirehose-destinations](../aws-kinesisfirehose-destinations) +The following destinations are supported. See [kinesisfirehose-destinations](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-kinesisfirehose-destinations-readme.html) for the implementations of these destinations. ### S3 From 77318c2b507444307cc03d10ec3f81047d2db772 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Mon, 28 Jun 2021 10:37:31 -0500 Subject: [PATCH 18/34] refer to specific alternative solutions --- text/0340-firehose-l2.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index ec0770c36..00e96a60b 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -721,8 +721,8 @@ effort. We are not confident that the service API is fully set in stone and implementing an L2 on top of the current L1 may be setting us up for changes in the future. We are reaching out -to the service team to get their input and plans for the service. See: “alternative -solutions”, below, for concrete details. +to the service team to get their input and plans for the service. See: alternative +solutions 2, 3, and 4, below, for concrete details. It’s a large effort (3 devs * 1 week) to invest in a module when we have other pressing projects. However, the bulk of the effort has been spent already since we have fairly From d5991db36e79986a109ea1dd589f195ab360c606 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 2 Jul 2021 13:38:49 -0700 Subject: [PATCH 19/34] destinations take mandatory first positional argument --- text/0340-firehose-l2.md | 52 +++++++++++++++------------------------- 1 file changed, 19 insertions(+), 33 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 00e96a60b..f45d59212 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -40,9 +40,11 @@ used as a destination. More supported destinations are covered [below](#destinat ```ts import * as destinations from '@aws-cdk/aws-kinesisfirehose-destinations'; +import * as s3 from '@aws-cdk/aws-s3'; +const bucket = new s3.Bucket(this, 'Bucket'); new DeliveryStream(this, 'Delivery Stream', { - destination: new destinations.S3(), + destination: new destinations.S3(bucket), }); ``` @@ -112,9 +114,7 @@ import * as destinations from '@aws-cdk/aws-kinesisfirehose-destinations'; const bucket = new s3.Bucket(this, 'Bucket'); -const s3Destination = new destinations.S3({ - bucket: bucket, -}); +const s3Destination = new destinations.S3(bucket); new DeliveryStream(this, 'Delivery Stream', { destination: s3Destination, @@ -126,8 +126,7 @@ successfully delivered to S3. `errorOutputPrefix` will be added to failed record writing them to S3. ```ts fixture=with-bucket -const s3Destination = new destinations.S3({ - bucket: bucket, +const s3Destination = new destinations.S3(bucket, { prefix: 'myFirehose/DeliveredYear=!{timestamp:yyyy}/anyMonth/rand=!{firehose:random-string}', errorOutputPrefix: 'myFirehoseFailures/!{firehose:error-output-type}/!{timestamp:yyyy}/anyMonth/!{timestamp:dd}', }); @@ -146,8 +145,7 @@ const domain = new es.Domain(this, 'Domain', { }); const deliveryStream = new DeliveryStream(this, 'Delivery Stream', { - destination: new destinations.Elasticsearch({ - domain: domain, + destination: new destinations.Elasticsearch(domain, { indexName: 'myindex', }), }); @@ -182,8 +180,7 @@ const cluster = new redshift.Cluster(this, 'Cluster', { publiclyAccessible: true, }); -const redshiftDestination = new destinations.Redshift({ - cluster: cluster, +const redshiftDestination = new destinations.Redshift(cluster, { user: { username: 'firehose', }, @@ -358,9 +355,8 @@ delivered to S3 without compression. ```ts fixture=with-bucket // Compress data delivered to S3 using Snappy -const s3Destination = new destinations.S3({ +const s3Destination = new destinations.S3(bucket, { compression: Compression.SNAPPY, - bucket: bucket, }); new DeliveryStream(this, 'Delivery Stream', { destination: destination, @@ -380,10 +376,9 @@ buffer size is 3 MiB and the buffer interval is 1 minute. // Increase the buffer interval and size to 5 minutes and 3 MiB, respectively import * as cdk from '@aws-cdk/core'; -const s3Destination = new destinations.S3({ +const s3Destination = new destinations.S3(bucket, { bufferingInterval: cdk.Duration.minutes(5), bufferingSize: cdk.Size.mebibytes(8), - bucket: bucket, }); new DeliveryStream(this, 'Delivery Stream', { destination: destination, @@ -409,8 +404,7 @@ import * as s3 from '@aws-cdk/aws-s3'; // Enable backup of all source records (to an S3 bucket created by CDK) const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All', { - destination: new destinations.Elasticsearch({ - domain: domain, + destination: new destinations.Elasticsearch(domain, { indexName: 'myindex', backup: BackupMode.ALL, }), @@ -418,8 +412,7 @@ const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All', { // Enable backup of only the source records that failed to deliver (to an S3 bucket created by CDK) const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup Failed', { - destination: new destinations.Elasticsearch({ - domain: domain, + destination: new destinations.Elasticsearch(domain, { indexName: 'myindex', backup: BackupMode.FAILED, }), @@ -428,8 +421,7 @@ const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup Failed', // Explicitly provide an S3 bucket to which all source records will be backed up const backupBucket = new s3.Bucket(this, 'Bucket'); const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All Explicit Bucket', { - destination: new destinations.Elasticsearch({ - domain: domain, + destination: new destinations.Elasticsearch(domain, { indexName: 'myindex', backupBucket: backupBucket, }), @@ -437,8 +429,7 @@ const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All Expl // Explicitly provide an S3 prefix under which all source records will be backed up const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All Explicit Prefix', { - destination: new destinations.Elasticsearch({ - domain: domain, + destination: new destinations.Elasticsearch(domain, { indexName: 'myindex', backup: BackupMode.ALL, backupPrefix: 'mybackup', @@ -484,14 +475,13 @@ const lambdaFunction = new lambda.Function(this, 'Processor', { handler: 'index.handler', code: lambda.Code.fromAsset(path.join(__dirname, 'process-records')), }); -const s3Destination = new firehosedestinations.S3Destination({ +const s3Destination = new destinations.S3(bucket, { processors: [{ lambdaFunction: lambdaFunction, bufferingInterval: cdk.Duration.minutes(5), bufferingSize: cdk.Size.mebibytes(5), retries: 5, }], - bucket: bucket, }); new DeliveryStream(this, 'Delivery Stream', { destination: destination, @@ -508,7 +498,7 @@ Kinesis Data Firehose can convert the format of your input data from JSON to before storing the data in S3. This allows you to change the format of your data records without writing any Lambda code, but you must use S3 as your destination. -```ts +```ts fixture=with-bucket import * as glue from '@aws-cdk/aws-glue'; import * as destinations from '@aws-cdk/aws-kinesisfirehose-destinations'; @@ -532,7 +522,7 @@ const myGlueTable = new glue.Table(this, 'MyGlueTable', { }); new DeliveryStream(this, 'Delivery Stream', { - destination: new destinations.S3({ + destination: new destinations.S3(bucket, { dataFormatConversionConfiguration: { schema: myGlueTable, inputFormat: destinations.InputFormat.OPENX_JSON @@ -554,19 +544,15 @@ S3 bucket, a Lambda data transformer, an AWS Glue table schema, etc. If you wish specify your own IAM role. It must have the correct permissions, or delivery stream creation or data delivery may fail. -```ts +```ts fixture=with-bucket import * as iam from '@aws-cdk/aws-iam'; -import * as s3 from '@aws-cdk/aws-iam'; const role = new iam.Role(this, 'Role', { assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'), } -const bucket = new s3.Bucket(stack, 'Bucket'); bucket.grantWrite(role); new DeliveryStream(stack, 'Delivery Stream', { - destination: new destinations.S3({ - bucket: bucket, - }), + destination: new destinations.S3(bucket), role: role, }); ``` @@ -835,7 +821,7 @@ robust prototypes already implemented. readonly backupBufferSize?: Size; } abstract class DestinationBase implements IDestination { - constructor(protected readonly props: DestinationProps = {}) {} + constructor(protected readonly props: DestinationProps = {}) {} abstract bind(scope: Construct, options: DestinationBindOptions): DestinationConfig; // Helper methods that subclasses can use to create common config protected createLoggingOptions(...): CfnDeliveryStream.CloudWatchLoggingOptionsProperty | undefined; From 85a9f3204d2906b91006f11130f2e11c1506ca60 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 2 Jul 2021 13:41:20 -0700 Subject: [PATCH 20/34] clarify SSE storage layer --- text/0340-firehose-l2.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index f45d59212..890145b88 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -222,10 +222,11 @@ This integration has not been completed (see #1234). Enabling server-side encryption (SSE) requires Kinesis Data Firehose to encrypt all data sent to delivery stream when it is stored at rest. This means that data is encrypted -before being written to the storage layer and decrypted after it is received from the -storage layer. The service manages keys and cryptographic operations so that sources and -destinations do not need to, as the data is encrypted and decrypted at the boundaries of -the service. By default, delivery streams do not have SSE enabled. +before being written to the service's internal storage layer and decrypted after it is +received from the internal storage layer. The service manages keys and cryptographic +operations so that sources and destinations do not need to, as the data is encrypted and +decrypted at the boundaries of the service (ie., before the data is delivered to a +destination). By default, delivery streams do not have SSE enabled. The Key Management Service (KMS) Customer Managed Key (CMK) used for SSE can either be AWS-owned or customer-managed. AWS-owned CMKs are keys that an AWS service (in this case From 3657965717d575b6d68d6c9769df330af7b18264 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 2 Jul 2021 14:01:04 -0700 Subject: [PATCH 21/34] reduce grant methods and add section on DeliveryStream < IGrantable --- text/0340-firehose-l2.md | 94 +++++++++++----------------------------- 1 file changed, 25 insertions(+), 69 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 890145b88..54adca4d9 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -536,7 +536,7 @@ new DeliveryStream(this, 'Delivery Stream', { See: [Converting Input Record Format](https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html) in the *Kinesis Data Firehose Developer Guide*. -### Specifying an IAM role +## Specifying an IAM role The DeliveryStream class automatically creates an IAM role with all the minimum necessary permissions for Kinesis Data Firehose to access the resources referenced by your delivery @@ -561,63 +561,18 @@ new DeliveryStream(stack, 'Delivery Stream', { See [Controlling Access](https://docs.aws.amazon.com/firehose/latest/dev/controlling-access.html) in the *Kinesis Data Firehose Developer Guide*. -## Permission Grants +## Granting application access to a delivery stream IAM roles, users or groups which need to be able to work with delivery streams should be granted IAM permissions. -Any object that implements the `IGrantable` interface (has an associated principal) can be -granted permissions to a delivery stream by calling: +Any object that implements the `IGrantable` interface (ie., has an associated principal) +can be granted permissions to a delivery stream by calling: -- `grantRead(principal)` - grants the principal read access to the control plane -- `grantWrite(principal)` - grants the principal write access to the control plane -- `grantWriteData(principal)` - grants the principal write access to the data plane -- `grantFullAccess(principal)` - grants principal full access to the delivery stream - -### Control Plane Read Permissions - -Grant `read` access to the control plane of a delivery stream by calling the `grantRead()` method. - -```ts fixture=with-delivery-stream -import * as iam from '@aws-cdk/aws-iam'; -const lambdaRole = new iam.Role(this, 'Role', { - assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'), -} - -// Give the role permissions to read information about the delivery stream -deliveryStream.grantRead(lambdaRole); -``` - -The following read permissions are provided to a service principal by the `grantRead()` method: - -- `firehose:DescribeDeliveryStream` -- `firehose:ListDeliveryStreams` -- `firehose:ListTagsForDeliveryStream` - -### Control Plane Write Permissions - -Grant `write` access to the control plane of a delivery stream by calling the `grantWrite()` method. - -```ts fixture=with-delivery-stream -import * as iam from '@aws-cdk/aws-iam'; -const lambdaRole = new iam.Role(this, 'Role', { - assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'), -} - -// Give the role permissions to modify the delivery stream -deliveryStream.grantWrite(lambdaRole); -``` - -The following write permissions are provided to a service principal by the `grantWrite()` method: - -- `firehose:DeleteDeliveryStream` -- `firehose:StartDeliveryStreamEncryption` -- `firehose:StopDeliveryStreamEncryption` -- `firehose:UpdateDestination` - -### Data Plane Write Permissions - -Grant `write` access to the data plane of a delivery stream by calling the `grantWriteData()` method. +- `grantPutRecords(principal)` - grants the principal the ability to put records onto the + delivery stream +- `grant(principal, ...actions)` - grants the principal permission to a custom set of + actions ```ts fixture=with-delivery-stream import * as iam from '@aws-cdk/aws-iam'; @@ -626,24 +581,31 @@ const lambdaRole = new iam.Role(this, 'Role', { } // Give the role permissions to write data to the delivery stream -deliveryStream.grantWriteData(lambdaRole); +deliveryStream.grantPutRecords(lambdaRole); ``` -The following write permissions are provided to a service principal by the `grantWriteData()` method: +The following write permissions are provided to a service principal by the `grantPutRecords()` method: - `firehose:PutRecord` - `firehose:PutRecordBatch` -### Custom Permissions +## Granting a delivery stream access to a resource -You can add any set of permissions to a delivery stream by calling the `grant()` method. +Conversely to the above, Kinesis Data Firehose requires permissions in order for delivery +streams to interact with resources that you own. For example, if an S3 bucket is specified +as a destination of a delivery stream, the delivery stream must be granted permissions to +put and get objects from the bucket. When using the built-in AWS service destinations +found in the `@aws-cdk/aws-kinesisfirehose-destinations` module, the CDK grants the +permissions automatically. However, custom or third-party destinations may require custom +permissions. In this case, use the delivery stream as an `IGrantable`, as follows: ```ts fixture=with-delivery-stream -import * as iam from '@aws-cdk/aws-iam'; -const user = new iam.User(this, 'User'); - -// give user permissions to update destination -deliveryStream.grant(user, 'firehose:UpdateDestination'); +/// !hide +const myDestinationResource = { + grantWrite(grantee: IGrantable) {} +} +/// !show +myDestinationResource.grantWrite(deliveryStream); ``` --- @@ -734,13 +696,7 @@ robust prototypes already implemented. readonly deliveryStreamArn: string; readonly deliveryStreamName: string; grant(grantee: iam.IGrantable, ...actions: string[]): iam.Grant; - // Grant permission to describe the stream - grantRead(grantee: iam.IGrantable): iam.Grant; - // Grant permission to modify the stream - grantWrite(grantee: iam.IGrantable): iam.Grant; - // Grant permission to write data to the stream - grantWriteData(grantee: iam.IGrantable): iam.Grant; - grantFullAccess(grantee: iam.IGrantable): iam.Grant; + grantPutRecords(grantee: iam.IGrantable): iam.Grant; metric(metricName: string, props?: cloudwatch.MetricOptions): cloudwatch.Metric; // Some canned metrics as well like `metricBackupToS3DataFreshness` } From 2cb3635cac2263ac502f8c29c74d2a5dbdb88358 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 2 Jul 2021 14:04:25 -0700 Subject: [PATCH 22/34] remove hedged sentence about implementing IDestination --- text/0340-firehose-l2.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 54adca4d9..6f4addaf0 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -618,8 +618,7 @@ creating a delivery stream. See Amazon Kinesis Data Firehose module README for usage examples. -If further customization is required, use `HttpDestination` from this package or implement -`firehose.IDestination`. +For non-AWS service destinations, use `HttpDestination`. --- From ee65a62e1a64a2a486613a4769a12f8e8345f08d Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 2 Jul 2021 15:11:34 -0700 Subject: [PATCH 23/34] make processor more generic instead of tied to Lambda --- text/0340-firehose-l2.md | 57 ++++++++++++++++++++++++++-------------- 1 file changed, 38 insertions(+), 19 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 6f4addaf0..71b2eff84 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -476,13 +476,13 @@ const lambdaFunction = new lambda.Function(this, 'Processor', { handler: 'index.handler', code: lambda.Code.fromAsset(path.join(__dirname, 'process-records')), }); +const lambdaProcessor = new LambdaFunctionProcessor(lambdaFunction, { + bufferingInterval: cdk.Duration.minutes(5), + bufferingSize: cdk.Size.mebibytes(5), + retries: 5, +}); const s3Destination = new destinations.S3(bucket, { - processors: [{ - lambdaFunction: lambdaFunction, - bufferingInterval: cdk.Duration.minutes(5), - bufferingSize: cdk.Size.mebibytes(5), - retries: 5, - }], + processors: [lambdaProcessor], }); new DeliveryStream(this, 'Delivery Stream', { destination: destination, @@ -740,23 +740,42 @@ robust prototypes already implemented. } ``` -- `DestinationBase` -- abstract base destination class with some helper - props/methods +- `IProcessor` -- interface that data processors will implement to grant permissions and + produce configuration that will be injected into the destination definition ```ts - // Compression method for data delivered to S3 - enum Compression { GZIP, HADOOP_SNAPPY, SNAPPY, UNCOMPRESSED, ZIP } - // Not yet fully-fleshed out - interface DataProcessor { - // Function that will be called to do data processing - readonly lambdaFunction: lambda.IFunction; - // Length of time delivery stream will buffer data before sending to processor + // Generic configuration for processors + interface DataProcessorProps { + // Length of time delivery stream will buffer data before passing to the processor readonly bufferInterval?: Duration; - // Size of buffer + // Size of the buffer readonly bufferSize?: Size; - // Number of retries for networking failures or invocation limits + // Number of times Firehose will retry the processor invocation due to network timeout or invocation limits readonly retries?: number; } + // The key-value pair that identifies the underlying processor resource. + // Directly from the CFN spec, example: { parameterName: 'LambdaArn', parameterValue: lambdaFunction.functionArn } + interface DataProcessorIdentifier { + readonly parameterName: string; + readonly parameterValue: string; + } + // Output of the IDataProcessor bind method, note the extension + interface DataProcessorConfig extends DataProcessorProps { + // The type of the underlying processor resource. + readonly processorType: string; + readonly processorIdentifier: DataProcessorIdentifier; + } + interface IDataProcessor { + bind(deliveryStream: IDeliveryStream): DataProcessorConfig; + } + ``` + +- `DestinationBase` -- abstract base destination class with some helper + props/methods + + ```ts + // Compression method for data delivered to S3 + enum Compression { GZIP, HADOOP_SNAPPY, SNAPPY, UNCOMPRESSED, ZIP } interface DestinationProps { // Whether failure logging should be enabled readonly logging?: boolean; @@ -764,7 +783,7 @@ robust prototypes already implemented. readonly logGroup?: logs.ILogGroup; // Data transformation to convert data before delivering // Should probably just be a singleton - readonly processors?: DataProcessor[]; + readonly processors?: IDataProcessor[]; // Whether to backup all source records, just failed records, or none readonly backup?: BackupMode; // Specific bucket to use for backup @@ -773,7 +792,7 @@ robust prototypes already implemented. readonly backupPrefix?: string; // Length of time delivery stream will buffer data before backing up readonly backupBufferInterval?: Duration; - // Size of buffer + // Size of the buffer readonly backupBufferSize?: Size; } abstract class DestinationBase implements IDestination { From 49a57ff6539a2ba02c3c00276bba6b1e482725a2 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 2 Jul 2021 15:19:43 -0700 Subject: [PATCH 24/34] update alternative solutions since generic processors are now implemented --- text/0340-firehose-l2.md | 85 +++++++++++++++++++++------------------- 1 file changed, 44 insertions(+), 41 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 71b2eff84..7cdacc88a 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -828,31 +828,33 @@ No. ### What alternative solutions did you consider? 1. Placing destinations into the core module instead of creating a separate module, since - a delivery stream can’t exist without a destination. We followed the common pattern of - placing service integrations (where one service provides an interface that multiple - other services implement) into a separate module. In contrast to many of the other - modules that follow this pattern, a delivery stream cannot be created without some - destination, as the destination is a key element of the service. It could be argued - that these destinations should be treated as first-class and co-located with the - delivery stream itself. However, this is similar to SNS, where a topic doesn’t have - much meaning without a subscription and yet service integrations for subscriptions are - still located in a separate module. + a delivery stream can’t exist without a destination. + We followed the common pattern of placing service integrations (where one service + provides an interface that multiple other services implement) into a separate + module. In contrast to many of the other modules that follow this pattern, a delivery + stream cannot be created without some destination, as the destination is a key element + of the service. It could be argued that these destinations should be treated as + first-class and co-located with the delivery stream itself. However, this is similar to + SNS, where a topic doesn’t have much meaning without a subscription and yet service + integrations for subscriptions are still located in a separate module. 2. Hoist common configuration/resources such as logging, data transformation, and backup - to the delivery stream level Currently, we follow the service API closely in the - hierarchical sense: many properties that are common to multiple destinations are - specified in the destination instead of on the delivery stream, since this is how the - API organizes it. Because the delivery stream only allows a single destination, - modeling these common properties on the delivery stream itself would reduce the amount - of configuration each destination implementation would need to manage. Practically, - this would look like moving every property in `DestinationProps` into - `DeliveryStreamProps`, as well as exposing hooks in `DestinationBindOptions` to allow - destinations to call configuration-creating functions during binding. Some downsides - of making this change: moving away from the service API may confuse customers who have - previously used it to create a delivery stream; if delivery streams support multiple - destinations in the future then configuration will not be flexible per-destination. + to the delivery stream level. + Currently, we follow the service API closely in the hierarchical sense: many properties + that are common to multiple destinations are specified in the destination instead of on + the delivery stream, since this is how the API organizes it. Because the delivery + stream only allows a single destination, modeling these common properties on the + delivery stream itself would reduce the amount of configuration each destination + implementation would need to manage. Practically, this would look like moving every + property in `DestinationProps` into `DeliveryStreamProps`, as well as exposing hooks in + `DestinationBindOptions` to allow destinations to call configuration-creating functions + during binding. Some downsides of making this change: moving away from the service API + may confuse customers who have previously used it to create a delivery stream; if + delivery streams support multiple destinations in the future then configuration will + not be flexible per-destination. 3. Provide a more generic interface for data transformers instead of requiring a Lambda - function. The data transformation API seems to indicate future support for processors - that are not Lambda functions; [ProcessingConfiguration.Processor](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-processor.html) + function. + The data transformation API seems to indicate future support for processors that are + not Lambda functions; [ProcessingConfiguration.Processor](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-processor.html) is quite generic, with the only processor type currently supported as “Lambda”. However, our DataProcessor requires a lambda.IFunction, tying our API to Lambda as the only supported processor and requiring a breaking change to support a different @@ -860,24 +862,25 @@ No. methods for each possible processor type (ie., `static fromFunction(lambdaFunction: lambda.IFunction, options: ProcessorOptions): Processor`). This may be too complex for a change that we are not confident will occur. -4. Allow multiple destinations to be provided to the delivery stream. While the console - UI only allows a single destination to be configured per delivery stream, the - horizontal model of the service API and the fact that a call to DescribeDeliveryStream - returns an array of destinations seems to indicate that the service team may support - multiple destinations in the future. To that end, we could modify `DeliveryStreamProps` - to accept an array of destinations (instead of a single destination, as is the case - currently) and simply throw an error if multiple destinations are provided until the - service team launches that feature. However, this would be significantly - future-proofing the API at the expense of confusing users that would reasonably assume - that multiple destinations are currently supported. -5. Allowing the user to create or use separate IAM roles for each aspect of the delivery - stream. This would mean that in a complex delivery stream using AWS Lambda - transformation, AWS Glue record conversion, S3 Backup, and a destination, that the user - could specify a separate IAM role for Firehose to access each of those resources. We - chose to only use one role for this design, because it will be simpler for the - user. While the API is in the experimental stage, we will solicit customer feedback to - find out if customers want to have more fine grained control over the permissions for - their delivery stream. + *Implemented.* +4. Allow multiple destinations to be provided to the delivery stream. + While the console UI only allows a single destination to be configured per delivery + stream, the horizontal model of the service API and the fact that a call to + DescribeDeliveryStream returns an array of destinations seems to indicate that the + service team may support multiple destinations in the future. To that end, we could + modify `DeliveryStreamProps` to accept an array of destinations (instead of a single + destination, as is the case currently) and simply throw an error if multiple + destinations are provided until the service team launches that feature. However, this + would be significantly future-proofing the API at the expense of confusing users that + would reasonably assume that multiple destinations are currently supported. +5. Allow the user to create or use separate IAM roles for each aspect of the delivery + stream. + This would mean that in a complex delivery stream using AWS Lambda transformation, AWS + Glue record conversion, S3 Backup, and a destination, that the user could specify a + separate IAM role for Firehose to access each of those resources. We chose to only use + one role for this design, because it will be simpler for the user. While the API is in + the experimental stage, we will solicit customer feedback to find out if customers want + to have more fine grained control over the permissions for their delivery stream. ### What is the high level implementation plan? From 541a77e111abd79ae8e80bb0541d0d3f8a9100b9 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 2 Jul 2021 15:27:35 -0700 Subject: [PATCH 25/34] support multiple destinations in the future --- text/0340-firehose-l2.md | 104 +++++++++++++++++++++++---------------- 1 file changed, 62 insertions(+), 42 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 7cdacc88a..460331a88 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -44,7 +44,7 @@ import * as s3 from '@aws-cdk/aws-s3'; const bucket = new s3.Bucket(this, 'Bucket'); new DeliveryStream(this, 'Delivery Stream', { - destination: new destinations.S3(bucket), + destinations: [new destinations.S3(bucket)], }); ``` @@ -75,7 +75,7 @@ import * as kinesis from '@aws-cdk/aws-kinesis'; const sourceStream = new kinesis.Stream(this, 'Source Stream'); new DeliveryStream(this, 'Delivery Stream', { sourceStream: sourceStream, - destination: destination, + destinations: [destination], }); ``` @@ -117,7 +117,7 @@ const bucket = new s3.Bucket(this, 'Bucket'); const s3Destination = new destinations.S3(bucket); new DeliveryStream(this, 'Delivery Stream', { - destination: s3Destination, + destinations: [s3Destination], }); ``` @@ -145,9 +145,11 @@ const domain = new es.Domain(this, 'Domain', { }); const deliveryStream = new DeliveryStream(this, 'Delivery Stream', { - destination: new destinations.Elasticsearch(domain, { - indexName: 'myindex', - }), + destinations: [ + new destinations.Elasticsearch(domain, { + indexName: 'myindex', + }), + ], }); ``` @@ -195,7 +197,7 @@ const redshiftDestination = new destinations.Redshift(cluster, { copyOptions: 'json \'auto\'', }); new DeliveryStream(this, 'Delivery Stream', { - destination: redshiftDestination, + destinations: [redshiftDestination], }); ``` @@ -244,20 +246,20 @@ import * as kms from '@aws-cdk/aws-kms'; // SSE with an AWS-owned CMK new DeliveryStream(this, 'Delivery Stream AWS Owned', { encryption: StreamEncryption.AWS_OWNED, - destination: destination, + destinations: [destination], }); // SSE with an customer-managed CMK that is created automatically by the CDK new DeliveryStream(this, 'Delivery Stream Implicit Customer Managed', { encryption: StreamEncryption.CUSTOMER_MANAGED, - destination: destination, + destinations: [destination], }); // SSE with an customer-managed CMK that is explicitly specified const key = new kms.Key(this, 'Key'); new DeliveryStream(this, 'Delivery Stream Explicit Customer Managed'', { encryptionKey: key, - destination: destination, + destinations: [destination], }); ``` @@ -292,7 +294,7 @@ import * as logs from '@aws-cdk/aws-logs'; const logGroup = new logs.LogGroup(this, 'Log Group'); new DeliveryStream(this, 'Delivery Stream', { logGroup: logGroup, - destination: destination, + destinations: [destination], }); ``` @@ -301,7 +303,7 @@ Logging can also be disabled: ```ts fixture=with-destination new DeliveryStream(this, 'Delivery Stream', { loggingEnabled: false, - destination: destination, + destinations: [destination], }); ``` @@ -360,7 +362,7 @@ const s3Destination = new destinations.S3(bucket, { compression: Compression.SNAPPY, }); new DeliveryStream(this, 'Delivery Stream', { - destination: destination, + destinations: [destination], }); ``` @@ -382,7 +384,7 @@ const s3Destination = new destinations.S3(bucket, { bufferingSize: cdk.Size.mebibytes(8), }); new DeliveryStream(this, 'Delivery Stream', { - destination: destination, + destinations: [destination], }); ``` @@ -405,36 +407,44 @@ import * as s3 from '@aws-cdk/aws-s3'; // Enable backup of all source records (to an S3 bucket created by CDK) const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All', { - destination: new destinations.Elasticsearch(domain, { - indexName: 'myindex', - backup: BackupMode.ALL, - }), + destinations: [ + new destinations.Elasticsearch(domain, { + indexName: 'myindex', + backup: BackupMode.ALL, + }), + ], }); // Enable backup of only the source records that failed to deliver (to an S3 bucket created by CDK) const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup Failed', { - destination: new destinations.Elasticsearch(domain, { - indexName: 'myindex', - backup: BackupMode.FAILED, - }), + destinations: [ + new destinations.Elasticsearch(domain, { + indexName: 'myindex', + backup: BackupMode.FAILED, + }), + ], }); // Explicitly provide an S3 bucket to which all source records will be backed up const backupBucket = new s3.Bucket(this, 'Bucket'); const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All Explicit Bucket', { - destination: new destinations.Elasticsearch(domain, { - indexName: 'myindex', - backupBucket: backupBucket, - }), + destinations: [ + new destinations.Elasticsearch(domain, { + indexName: 'myindex', + backupBucket: backupBucket, + }), + ], }); // Explicitly provide an S3 prefix under which all source records will be backed up const deliveryStream = new DeliveryStream(this, 'Delivery Stream Backup All Explicit Prefix', { - destination: new destinations.Elasticsearch(domain, { - indexName: 'myindex', - backup: BackupMode.ALL, - backupPrefix: 'mybackup', - }), + destinations: [ + new destinations.Elasticsearch(domain, { + indexName: 'myindex', + backup: BackupMode.ALL, + backupPrefix: 'mybackup', + }), + ], }); ``` @@ -485,7 +495,7 @@ const s3Destination = new destinations.S3(bucket, { processors: [lambdaProcessor], }); new DeliveryStream(this, 'Delivery Stream', { - destination: destination, + destinations: [destination], }); ``` @@ -523,13 +533,15 @@ const myGlueTable = new glue.Table(this, 'MyGlueTable', { }); new DeliveryStream(this, 'Delivery Stream', { - destination: new destinations.S3(bucket, { - dataFormatConversionConfiguration: { - schema: myGlueTable, - inputFormat: destinations.InputFormat.OPENX_JSON - outputFormat: destinations.OuputFormat.PARQUET - }, - }), + destinations: [ + new destinations.S3(bucket, { + dataFormatConversionConfiguration: { + schema: myGlueTable, + inputFormat: destinations.InputFormat.OPENX_JSON + outputFormat: destinations.OuputFormat.PARQUET + }, + }), + ], }); ``` @@ -553,7 +565,7 @@ const role = new iam.Role(this, 'Role', { } bucket.grantWrite(role); new DeliveryStream(stack, 'Delivery Stream', { - destination: new destinations.S3(bucket), + destinations: [new destinations.S3(bucket)], role: role, }); ``` @@ -608,6 +620,12 @@ const myDestinationResource = { myDestinationResource.grantWrite(deliveryStream); ``` +## Multiple destinations + +Though the delivery stream allows specifying an array of destinations, only one +destination per delivery stream is currently allowed. This limitation is enforced at +compile time and will throw an error. + --- # Amazon Kinesis Data Firehose Destinations Library @@ -706,7 +724,7 @@ robust prototypes already implemented. ```ts interface DeliveryStreamProps { // The destination that this delivery stream will deliver data to. - readonly destination: IDestination; + readonly destinations: IDestination[]; // Auto-generated by CFN readonly deliveryStreamName?: string; // Can source data from Kinesis, if not provided will use API to produce data @@ -851,6 +869,7 @@ No. may confuse customers who have previously used it to create a delivery stream; if delivery streams support multiple destinations in the future then configuration will not be flexible per-destination. + *Rejected*: supporting multiple destinations. 3. Provide a more generic interface for data transformers instead of requiring a Lambda function. The data transformation API seems to indicate future support for processors that are @@ -862,7 +881,7 @@ No. methods for each possible processor type (ie., `static fromFunction(lambdaFunction: lambda.IFunction, options: ProcessorOptions): Processor`). This may be too complex for a change that we are not confident will occur. - *Implemented.* + *Implemented*. 4. Allow multiple destinations to be provided to the delivery stream. While the console UI only allows a single destination to be configured per delivery stream, the horizontal model of the service API and the fact that a call to @@ -873,6 +892,7 @@ No. destinations are provided until the service team launches that feature. However, this would be significantly future-proofing the API at the expense of confusing users that would reasonably assume that multiple destinations are currently supported. + *Implemented*. 5. Allow the user to create or use separate IAM roles for each aspect of the delivery stream. This would mean that in a complex delivery stream using AWS Lambda transformation, AWS From 095ffe7ac070dec49986e316740cf2ff5215a498 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Fri, 2 Jul 2021 15:28:54 -0700 Subject: [PATCH 26/34] remove rejected tag in alternative solution 2 since hoisting for backup bucket could still make sense --- text/0340-firehose-l2.md | 1 - 1 file changed, 1 deletion(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 460331a88..5c66a7ef2 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -869,7 +869,6 @@ No. may confuse customers who have previously used it to create a delivery stream; if delivery streams support multiple destinations in the future then configuration will not be flexible per-destination. - *Rejected*: supporting multiple destinations. 3. Provide a more generic interface for data transformers instead of requiring a Lambda function. The data transformation API seems to indicate future support for processors that are From 763074f87bcdc1c2eab751feec346998957f6453 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Tue, 6 Jul 2021 14:54:51 -0700 Subject: [PATCH 27/34] clear up section on SSE with data stream as source --- text/0340-firehose-l2.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 5c66a7ef2..696983335 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -263,13 +263,18 @@ new DeliveryStream(this, 'Delivery Stream Explicit Customer Managed'', { }); ``` -If a Kinesis data stream is configured as the source of a delivery stream, Kinesis Data -Firehose no longer stores data at rest and all encryption is handled by Kinesis Data -Streams. Kinesis Data Firehose receives unencrypted data from Kinesis Data Streams, -buffers the data in memory, and sends the data to destinations without ever writing the -unencrypted data at rest. Practically, this means that SSE should be specified on the -Kinesis data stream when it is used as the source of a delivery stream (and specifying SSE -on the delivery stream will cause an error). +When you configure a Kinesis data stream as the data source of a Kinesis Data Firehose +delivery stream, Kinesis Data Firehose no longer stores the data at rest. Instead, the +data is stored in the data stream. When you send data from your data producers to your +data stream, Kinesis Data Streams encrypts your data before storing the data at rest. When +your Kinesis Data Firehose delivery stream reads the data from your data stream, Kinesis +Data Streams first decrypts the data and then sends it to Kinesis Data Firehose. Kinesis +Data Firehose buffers the data in memory then delivers it to your destinations without +storing the unencrypted data at rest. + +Practically, this means that SSE should be specified on the Kinesis data stream when it is +used as the source of a delivery stream. Specifying SSE on the delivery stream that has a +data stream as its source will cause an error. See: [Data Protection](https://docs.aws.amazon.com/firehose/latest/dev/encryption.html) in the *Kinesis Data Firehose Developer Guide*. From cb05d4897b49f0ce090cdd1e4f55d954d376b9e1 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Wed, 7 Jul 2021 13:41:23 -0700 Subject: [PATCH 28/34] updates for new RFC template --- text/0340-firehose-l2.md | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 696983335..88aa10cfe 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -1,10 +1,9 @@ ---- -rfc pr: [#342](https://github.com/aws/aws-cdk-rfcs/pull/342) -tracking issue: [#340](https://github.com/aws/aws-cdk-rfcs/issues/340) ---- - # Amazon Kinesis Data Firehose Delivery Stream L2 +* **Original Author(s):**: @BenChaimberg, @madeline-k, @otaviomacedo +* **Tracking Issue**: #340 +* **API Bar Raiser**: @rix0rrr + The `aws-kinesisfirehose` construct library allows you to create Amazon Kinesis Data Firehose delivery streams and destinations with just a few lines of code. As with most construct libraries, you can also easily define permissions @@ -645,7 +644,15 @@ For non-AWS service destinations, use `HttpDestination`. --- -## FAQ +Ticking the box below indicates that the public API of this RFC has been +signed-off by the API bar raiser (the `api-approved` label was applied to the +RFC pull request): + +``` +[ ] Signed-off by API Bar Raiser @xxxxx +``` + +## Public FAQ ### What are we launching today? @@ -699,9 +706,7 @@ It’s a large effort (3 devs * 1 week) to invest in a module when we have other projects. However, the bulk of the effort has been spent already since we have fairly robust prototypes already implemented. -### What changes are required to enable this change? - -#### Design +### What is the technical solution (design) of this feature? - `IDeliveryStream` -- interface for created and imported delivery streams @@ -848,6 +853,11 @@ robust prototypes already implemented. No. +### What are the drawbacks of this solution? + +No problems or risks of implementing this feature as a whole, though the design outlined +above may have drawbacks, as detailed below in "alternative solutions". + ### What alternative solutions did you consider? 1. Placing destinations into the core module instead of creating a separate module, since From 4e0a02b48cc4cd392e63bfaecdcff387857577f9 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Wed, 7 Jul 2021 13:49:02 -0700 Subject: [PATCH 29/34] un-code API BR checkbox --- text/0340-firehose-l2.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 88aa10cfe..64ea25834 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -648,9 +648,7 @@ Ticking the box below indicates that the public API of this RFC has been signed-off by the API bar raiser (the `api-approved` label was applied to the RFC pull request): -``` -[ ] Signed-off by API Bar Raiser @xxxxx -``` +[ ] Signed-off by API Bar Raiser @rix0rrr ## Public FAQ From df6e41527c5214a633f64fa1d67c4ffd38900d3a Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Wed, 7 Jul 2021 13:50:20 -0700 Subject: [PATCH 30/34] fix checkbox --- text/0340-firehose-l2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 64ea25834..1ae6a9941 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -648,7 +648,7 @@ Ticking the box below indicates that the public API of this RFC has been signed-off by the API bar raiser (the `api-approved` label was applied to the RFC pull request): -[ ] Signed-off by API Bar Raiser @rix0rrr +- [ ] Signed-off by API Bar Raiser @rix0rrr ## Public FAQ From 0a445577f33c4957e9a5b8c10bf44b3fe7e0e77f Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Wed, 7 Jul 2021 13:53:55 -0700 Subject: [PATCH 31/34] fix relative link for IDeliveryStream --- text/0340-firehose-l2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 1ae6a9941..83299887e 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -326,7 +326,7 @@ limit of records per second (indicating data is flowing into your delivery strea than it is configured to process). CDK provides methods for accessing delivery stream metrics with default configuration, -such as `metricIncomingBytes`, and `metricIncomingRecords` (see [`IDeliveryStream`](../lib/delivery-stream.ts) +such as `metricIncomingBytes`, and `metricIncomingRecords` (see [`IDeliveryStream`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-kinesisfirehose.IDeliveryStream.html) for a full list). CDK also provides a generic `metric` method that can be used to produce metric configurations for any metric provided by Kinesis Data Firehose; the configurations are pre-populated with the correct dimensions for the delivery stream. From 92df708a13792e46c88d8e03986f3afc2940225a Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Wed, 7 Jul 2021 15:01:03 -0700 Subject: [PATCH 32/34] use define instead of create in some places --- text/0340-firehose-l2.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 83299887e..dec9b96a3 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -47,7 +47,7 @@ new DeliveryStream(this, 'Delivery Stream', { }); ``` -The above example creates the following resources: +The above example implicitly defines the following resources: - An S3 bucket - A Kinesis Data Firehose delivery stream with Direct PUT as the source and CloudWatch @@ -105,7 +105,7 @@ for the implementations of these destinations. ### S3 -Creating a delivery stream with an S3 bucket destination: +Defining a delivery stream with an S3 bucket destination: ```ts import * as s3 from '@aws-cdk/aws-s3'; @@ -636,7 +636,7 @@ compile time and will throw an error. This library provides constructs for adding destinations to a Amazon Kinesis Data Firehose delivery stream. Destinations can be added by specifying the `destination` prop when -creating a delivery stream. +defining a delivery stream. See Amazon Kinesis Data Firehose module README for usage examples. @@ -706,7 +706,7 @@ robust prototypes already implemented. ### What is the technical solution (design) of this feature? -- `IDeliveryStream` -- interface for created and imported delivery streams +- `IDeliveryStream` -- interface for defined and imported delivery streams ```ts interface IDeliveryStream extends @@ -727,7 +727,7 @@ robust prototypes already implemented. } ``` -- `DeliveryStreamProps` -- configuration for creating a `DeliveryStream` +- `DeliveryStreamProps` -- configuration for defining a `DeliveryStream` ```ts interface DeliveryStreamProps { @@ -863,7 +863,7 @@ above may have drawbacks, as detailed below in "alternative solutions". We followed the common pattern of placing service integrations (where one service provides an interface that multiple other services implement) into a separate module. In contrast to many of the other modules that follow this pattern, a delivery - stream cannot be created without some destination, as the destination is a key element + stream cannot be defined without some destination, as the destination is a key element of the service. It could be argued that these destinations should be treated as first-class and co-located with the delivery stream itself. However, this is similar to SNS, where a topic doesn’t have much meaning without a subscription and yet service @@ -905,8 +905,7 @@ above may have drawbacks, as detailed below in "alternative solutions". would be significantly future-proofing the API at the expense of confusing users that would reasonably assume that multiple destinations are currently supported. *Implemented*. -5. Allow the user to create or use separate IAM roles for each aspect of the delivery - stream. +5. Allow the user to use separate IAM roles for each aspect of the delivery stream. This would mean that in a complex delivery stream using AWS Lambda transformation, AWS Glue record conversion, S3 Backup, and a destination, that the user could specify a separate IAM role for Firehose to access each of those resources. We chose to only use From adaf6036ad65387157b8e0183961437faf24d50b Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Wed, 7 Jul 2021 15:01:14 -0700 Subject: [PATCH 33/34] add intro paragraph on distingushing between firehose delivery stream and data stream --- text/0340-firehose-l2.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index dec9b96a3..8e48e41c9 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -29,12 +29,22 @@ destinations. is a service for fully-managed delivery of real-time streaming data to storage services such as Amazon S3, Amazon Redshift, Amazon Elasticsearch, Splunk, or any custom HTTP endpoint or third-party services such as Datadog, Dynatrace, LogicMonitor, MongoDB, New -Relic, and Sumo Logic. This module is part of the [AWS Cloud Development Kit](https://github.com/aws/aws-cdk) -project. It allows you to create Kinesis Data Firehose delivery streams. +Relic, and Sumo Logic. -## Creating a Delivery Stream +Kinesis Data Firehose delivery streams are distinguished from Kinesis data streams in +their models of consumtpion. Whereas consumers read from a data stream by actively pulling +data from the stream, a delivery stream pushes data to its destination on a regular +cadence. This means that data streams are intended to have consumers that do on-demand +processing, like AWS Lambda or Amazon EC2. On the other hand, delivery streams are +intended to have destinations that are sources for offline processing and analytics, such +as Amazon S3 and Amazon Redshift. -In order to create a Delivery Stream, you must specify a destination. An S3 bucket can be +This module is part of the [AWS Cloud Development Kit](https://github.com/aws/aws-cdk) +project. It allows you to define Kinesis Data Firehose delivery streams. + +## Defining a Delivery Stream + +In order to define a Delivery Stream, you must specify a destination. An S3 bucket can be used as a destination. More supported destinations are covered [below](#destinations). ```ts From 2a81e334fdfac1123fc22208c2a46caf902f1886 Mon Sep 17 00:00:00 2001 From: Ben Chaimberg Date: Thu, 8 Jul 2021 08:38:43 -0700 Subject: [PATCH 34/34] mark API as approved --- text/0340-firehose-l2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0340-firehose-l2.md b/text/0340-firehose-l2.md index 8e48e41c9..56f107889 100644 --- a/text/0340-firehose-l2.md +++ b/text/0340-firehose-l2.md @@ -658,7 +658,7 @@ Ticking the box below indicates that the public API of this RFC has been signed-off by the API bar raiser (the `api-approved` label was applied to the RFC pull request): -- [ ] Signed-off by API Bar Raiser @rix0rrr +- [X] Signed-off by API Bar Raiser @rix0rrr ## Public FAQ