Skip to content

patrick-muller/athena-gmail

 
 

Repository files navigation

Athena Gmail Connector

Another Thanksgiving day experiment from @dacort

Overview

Ever wanted to query your email from Athena? Well now you can!

Usage

You can (eventually) use any advanched search syntax Gmail supports in your WHERE clause.

  • SELECT * FROM gmail.messages WHERE meta_gmailquery='from:amazonaws.com'

For this experiment, we only load 100 messages.

Requirements

  • Create a Google OAuth client configured as a "Desktop App"
  • Run python quickstart.py to populate local credentials

Docker Usage

  • In this directory, build the Docker image:
docker build -t gathena .
  • Start the container
docker run -p 9000:8080 gathena:latest
  • Test the endpoint!
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"@type": "PingRequest", "identity": {"id": "UNKNOWN", "principal": "UNKNOWN", "account": "123456789012", "arn": "arn:aws:iam::123456789012:root", "tags": {}, "groups": []}, "catalogName": "gmail", "queryId": "1681559a-548b-4771-874c-2aa2ea7c39ab"}'

Uploading

  • Create a container repository
export AWS_REGION=us-east-1
aws ecr create-repository --repository-name gathena --image-scanning-configuration scanOnPush=true
docker tag gathena:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/gathena:latest
aws ecr get-login-password | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/gathena:latest
  • Create a Lambda function with the above container

  • Set an environment variable on the Lambda function with a spill bucket

aws lambda update-function-configuration --function-name gathena_container --environment 'Variables={TARGET_BUCKET=<BUCKET_NAME>}'      
  • Add a new data source to Athena pointing to the Lambda function

  • If changing code, use AWS_ACCOUNT_ID=123456789012 make docker to rebuild and update your Lambda function.

Schema thoughts

messageId subject from sentDate xMailer

multi valued fields :

allTo flags : possible flags are 'answered', 'deleted', 'draft', 'flagged' , 'recent', 'seen' content attachment attachmentNames;

About

Athena Gmail connector

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.5%
  • Makefile 3.6%
  • Dockerfile 0.9%