Skip to content

guy-remarkety/amazon-s3-access-logs-queries

 
 

Repository files navigation

Analyzing S3 CloudFront access logs at scale

This template modifies the implementation discussed in the Analyze your Amazon CloudFront access logs at scale article to process S3 access logs, using the same concepts. It uses: S3 Access Logging, Amazon Athena, AWS Glue, AWS Lambda, and Amazon Simple Storage Service (S3).

Overview

The application has two main parts:

  • An S3 bucket <StackName>-s3-access-logs that serves as a log bucket for S3 access logs. As soon as Amazon S3 delivers a new access logs file, an event triggers the AWS Lambda function moveAccessLogs. This moves the file to an Apache Hive style prefix.

    infrastructure-overview

  • An hourly scheduled AWS Lambda function transformPartition that runs an INSERT INTO query on a single partition per run, taking one hour of data into account. It writes the content of the partition to the Apache Parquet format into the <StackName>-s3-access-logs S3 bucket.

    infrastructure-overview

FAQs

Q: How can I get started?

Use the Launch Stack button above to start the deployment of the application to your account. The AWS Management Console will guide you through the process. You can override the following parameters during deployment:

  • The NewKeyPrefix (default: new/) is the S3 prefix that is used in the configuration of your Amazon S3 access logging for log storage. The AWS Lambda function will move the files from here.
  • The GzKeyPrefix (default: partitioned-gz/) and ParquetKeyPrefix (default: partitioned-parquet/) are the S3 prefixes for partitions that contain gzip or Apache Parquet files.
  • ResourcePrefix (default: myapp) is a prefix that is used for the S3 bucket and the AWS Glue database to prevent naming collisions.

The stack contains a single S3 bucket called <ResourcePrefix>-<AccountId>-s3-access-logs. After the deployment you can modify your existing S3 Access Logging configuration to deliver access logs to this bucket with the new/ log prefix.

As soon Amazon S3 delivers new access logs, files will be moved to GzKeyPrefix. After 1-2 hours, they will be transformed to files in ParquetKeyPrefix.

You can query your access logs at any time in the Amazon Athena Query editor using the AWS Glue view called combined in the database called <ResourcePrefix>_cf_access_logs_db:

SELECT * FROM s3_access_logs.combined limit 10;

Q: How can I customize and deploy the template?

  1. Fork this GitHub repository.

  2. Clone the forked GitHub repository to your local machine.

  3. Modify the templates.

  4. Install the AWS CLI & AWS Serverless Application Model (SAM) CLI.

  5. Validate your template:

    $ sam validate -t template.yaml
  6. Package the files for deployment with SAM (see SAM docs for details) to a bucket of your choice. The bucket's region must be in the region you want to deploy the sample application to:

    $ sam package
        --template-file template.yaml
        --output-template-file packaged.yaml
        --s3-bucket <BUCKET>
  7. Deploy the packaged application to your account:

    $ aws cloudformation deploy
        --template-file packaged.yaml
        --stack-name my-stack
        --capabilities CAPABILITY_IAM

Q: How can I use the sample application for multiple Amazon S3 distributions?

Deploy another AWS CloudFormation stack from the same template to create a new bucket for different distributions or environments. The stack name is added to all resource names (e.g. AWS Lambda functions, S3 bucket etc.) so you can distinguish the different stacks in the AWS Management Console.

Q: How can I add a new question to this list?

If you found yourself wishing this set of frequently asked questions had an answer for a particular problem, please submit a pull request. The chances are good that others will also benefit from having the answer listed here.

Q: How can I contribute?

See the Contributing Guidelines for details.

License Summary

This sample code is made available under a modified MIT license. See the LICENSE file.

About

Analyze your Amazon CloudFront Access Logs at Scale with Amazon Athena.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 100.0%