Skip to content

tmtk75/athena-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

What's this?

AWS Athena command line tool helping you to casually execute your queries.

  • Execute query given by arg or from stdin.
    • Templating.
    • Synchronously waiting for its completion.
    • If interrupted, stop its execution.
  • List query-executions, not query-excution IDs.
  • Get s3 object with execution-id.
  • Show query result with exeuction-id.

Getting Started

You have these.

  • A s3 bucket to write outputs.
  • Your AWS profile with proper permissions.
  • ELB logs

Rename .athena-cli.yaml.templ to .athena-cli.yaml and fill placeholders with your values.

# verbose: true
# work-group: <your-workgroup>
output-location: s3://<your-bucket>/outputs
database-name: <database-name>
timeout: 30s

Let's say the both bucket and database is sample_elb in the getting started.

Then run list, you'll see the below one and ready to execute.

$ export AWS_PROFILE=<your-profile>
$ ./athena-cli list
[]

Query execution

Let's create your database and check it's created.

$ athena-cli query "create database sample_elb"

$ athena-cli query --query.result-csv "create database sample_elb"
sample_elb

Let's see the current execution list.

$ athena-cli list --list.limit 1 | jq .
[
  {
    "Query": "create database sample_elb",
    "QueryExecutionContext": {
      "Database": "sample_elb"
    },
    "QueryExecutionId": "6cf8459e-c72a-4375-ab01-34f0244868b8",
    "ResultConfiguration": {
      "EncryptionConfiguration": null,
      "OutputLocation": "s3://sample_elb/outputs/6cf8459e-c72a-4375-ab01-34f0244868b8.txt"
    },
    ...
  }
]

That's a sample for a simple query.

Querying ELB logs

Let's try to query ELB logs in a S3 bucket. Prepare these.

  • S3 bucket name (w/ prefix)
  • AWS account ID

First of all, create a table.

$ athena-cli query --dry-run \
  --query.values '{
    "s3buketname": "a_elb/foo",
    "accountId": "123456789123"
  }' < query/mk-table-elb.q
...
LOCATION 's3://a_elb/foo/elb/AWSLogs/123456789123/elasticloadbalancing/';

You can check executed query with --dry-run. Then run it without --dry-run.

$ athena-cli query --query.values '{"s3buketname":"a_elb/foo","accountId":"123456789123"}' < query/mk-table-elb.q

$

You're ready to select. Just in case, put limit 1 at the tail.

$ athena-cli query "select * from elb_logs limit 1" | jq .
{
  "execution_id": "b5022e5e-0fec-4782-9649-c35633f0328b",
  "rows": [
    {
      "backend_ip": "10.1.145.234",
      "backend_port": "8080",
...

Examples

CloudTrail Logs

Steps of Querying AWS CloudTrail Logs.

Creating the table.

athena-cli query --query.values '{
    "s3bucketname": "your-cloudtrail-logs-bucket",
    "accountId": "123456789123"
  }' < query/mk-table-ct.q

Load the partitions.

athena-cli query --query.values '{
    "s3bucketname": "your-cloudtrail-logs-bucket",
    "accountId": "123456789123"
    "region": "ap-northeast-1",
    "year": "2020",
    "month": "03",
    "day": "29"
  }' < query/mk-part.q
athena-cli query "select * from cloudtrail_logs limit 1"
athena-cli query --query.values '{"year":"2019","month":"08","day":"19"}' < query/drop-part.q

Profile

--profile option is supported from v0.2.0. You can put your config file as ~/.config/athena-cli/config.yaml so that you don't need giving work group name, database name every time you run.

profiles:
  dev: &dev
    timeout: 30s
    database-name: elblog-dev
  prod: &prod
    timeout: 5m
    database-name: elblog-prod

Auto profile based on AWS account ID

You can also associate one profile to one AWS account ID.

For the below example, dev is used when you run with AWS account, "123456791234".

profiles:
  dev: &dev
    timeout: 30s
    database-name: elblog-dev
  prod: &prod
    timeout: 5m
    database-name: elblog-prod
accounts:
  "123456791234": *dev
  "223344556677": *prod