Athena Workshop

Preparation

Using AWS CloudFormation to launch the envrionment of Labs

Labs - Athena Basics
Labs - Federated Queries
Labs - User Defined Functions
Labs - Custom Connector
Labs - Machine Learning

Labs - Athena Basics

Prepare Datasets
Create Athena Table
Create Tables with Glue
Athena Create View
ETL using Athena CTAS
Use Athena workgroups to separate users, teams, applications, or workloads, to set limits on amount of data each query or the entire workgroup can process, and to track costs
Visualize with QuickSight using Athena

Labs - Federated Queries

Athena Federation - Query Data Anywhere

Athena uses data source connectors that run on AWS Lambda to execute federated queries. A data source connector is a piece of code that can translate between your target data source and Athena.

Prepare TPCH Database & Tables and Data Sources
Aurora Connector setup by using Serverless Application Repository AthenaJdbcConnector

This will deploy Athena JDBC connector for Aurora Mysql and you can refer to this lambda function in your queries as "lambda:mysql"

DynamoDB Connector setup by using Serverless Application Repository AthenaDynamoDBConnector

This will deploy Athena DynamoDB connector and you can refer to this lambda function in your queries as "lambda:dynamo"

HBase Connector setup by using Serverless Application Repository AthenaHBaseConnector

This will deploy Athena HBase connector and you can refer to this lambda function in your queries as “lambda:hbase”

Redis Connector setup by using Serverless Application Repository

This will deploy Athena Redis connector and you can refer to this lambda function in your queries as "lambda:redis"

Since Redis doesn't have a schema of it's own the Redis Connector can't infer the columns or data type from Redis. Redis connector needs a Glue database and tables to be setup to be able to associate the data to the schema.

Run Federated Queries

No needSwitch workgroup to AmazonAthenaPreviewFunctionality, you can keep in primary workgroup

Labs - User Defined Functions

Athena User Defined Function enables customers to write custom scalar functions and invoke them in SQL queries.

When a UDF is used in a SQL query submitted to Athena, it is invoked and executed on AWS Lambda. UDFs can be used in both SELECT and FILTER clauses of a SQL query. Users can invoke multiple UDFs in the same query.

UDFs enable customers to perform custom processing such as compressing and decompressing data, redacting sensitive data, or applying customized decryption.

Setup the Cloud9

Resize the disk to 20GB.
Setting Development Environment

git clone https://github.com/awslabs/aws-athena-query-federation.git 
cd aws-athena-query-federation
sudo chown ec2-user:ec2-user ~/.profile
./tools/prepare_dev_env.sh
source ~/.profile
mvn clean install -DskipTests=true > /tmp/log

Create UDF Code and Publish to Serverless Application Repository
UDF Connector Setup
Querying with UDF to mask the sensitive data To learn more about UDF syntax click here : UDF Query Syntax

Labs - Custom Connector

Deploy a custom Athena connector and then configure it and run Athena Federated query against this connector.

A custom connector can be useful when you have a legacy datasource or you want to have a connector to a datastore which doesn't currently exist. In these cases you can write and deploy your own custom connector to Private Serverless Application Repository and use it.

Here is an example custom connector

Develop Custom Connector Code

../tools/publish.sh S3_BUCKET_NAME athena-example

Custom Connector Setup

Querying with Connector

Upload Sample Data to the S3 Bucket for the Example Connector

aws s3 cp ~/environment/aws-athena-query-federation/athena-example/sample_data.csv s3://athena-federation-workshop-744670048729/2017/11/1/sample_data.csv

Labs - Machine Learning

You can use Athena to write SQL statements that run Machine Learning (ML) inference using Amazon SageMaker. This feature simplifies access to ML models for data analysis, eliminating the need to use complex programming methods to run inference.

To use ML with Athena, you define an ML with Athena function with the USING FUNCTION clause.

Setup SageMaker Notebook
Download the Jupyter Notebook
Run Jupyter Notebook

Train a RandomCutForest Model to detect anomalies and deploy the model to a SageMaker endpoint that the application or Athena can call

Run Athena ML Query

As you can see from the results, With each data point, Random Cut Forest algorithm associates an anomaly score. Low score values indicate that the data point is considered “normal.” High values indicate the presence of an anomaly in the data. The definitions of “low” and “high” depend on the application but common practice suggests that scores beyond three standard deviations from the mean score are considered anomalous.

Reference

Workshop URL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Athena-workshop.md

Athena-workshop.md

Athena Workshop

Preparation

Labs - Athena Basics

Labs - Federated Queries

Labs - User Defined Functions

Labs - Custom Connector

Labs - Machine Learning

Reference

Files

Athena-workshop.md

Latest commit

History

Athena-workshop.md

File metadata and controls

Athena Workshop

Preparation

Labs - Athena Basics

Labs - Federated Queries

Labs - User Defined Functions

Labs - Custom Connector

Labs - Machine Learning

Reference