Using AWS CloudFormation to launch the envrionment of Labs
- Labs - Athena Basics
- Labs - Federated Queries
- Labs - User Defined Functions
- Labs - Custom Connector
- Labs - Machine Learning
- Prepare Datasets
- Create Athena Table
- Create Tables with Glue
- Athena Create View
- ETL using Athena CTAS
- Use Athena workgroups to separate users, teams, applications, or workloads, to set limits on amount of data each query or the entire workgroup can process, and to track costs
- Visualize with QuickSight using Athena
Athena uses data source connectors that run on AWS Lambda to execute federated queries. A data source connector is a piece of code that can translate between your target data source and Athena.
This will deploy Athena JDBC connector for Aurora Mysql and you can refer to this lambda function in your queries as "lambda:mysql"
This will deploy Athena DynamoDB connector and you can refer to this lambda function in your queries as "lambda:dynamo"
This will deploy Athena HBase connector and you can refer to this lambda function in your queries as “lambda:hbase”
This will deploy Athena Redis connector and you can refer to this lambda function in your queries as "lambda:redis"
Since Redis doesn't have a schema of it's own the Redis Connector can't infer the columns or data type from Redis. Redis connector needs a Glue database and tables to be setup to be able to associate the data to the schema.
No needSwitch workgroup
to AmazonAthenaPreviewFunctionality
, you can keep in primary
workgroup
Athena User Defined Function enables customers to write custom scalar functions and invoke them in SQL queries.
When a UDF is used in a SQL query submitted to Athena, it is invoked and executed on AWS Lambda. UDFs can be used in both SELECT and FILTER clauses of a SQL query. Users can invoke multiple UDFs in the same query.
UDFs enable customers to perform custom processing such as compressing and decompressing data, redacting sensitive data, or applying customized decryption.
-
- Resize the disk to 20GB.
- Setting Development Environment
git clone https://github.com/awslabs/aws-athena-query-federation.git cd aws-athena-query-federation sudo chown ec2-user:ec2-user ~/.profile ./tools/prepare_dev_env.sh source ~/.profile mvn clean install -DskipTests=true > /tmp/log
-
Create UDF Code and Publish to Serverless Application Repository
-
Querying with UDF to mask the sensitive data To learn more about UDF syntax click here : UDF Query Syntax
Deploy a custom Athena connector and then configure it and run Athena Federated query against this connector.
A custom connector can be useful when you have a legacy datasource or you want to have a connector to a datastore which doesn't currently exist. In these cases you can write and deploy your own custom connector to Private Serverless Application Repository and use it.
Here is an example custom connector
../tools/publish.sh S3_BUCKET_NAME athena-example
-
- Upload Sample Data to the S3 Bucket for the Example Connector
aws s3 cp ~/environment/aws-athena-query-federation/athena-example/sample_data.csv s3://athena-federation-workshop-744670048729/2017/11/1/sample_data.csv
You can use Athena to write SQL statements that run Machine Learning (ML) inference using Amazon SageMaker. This feature simplifies access to ML models for data analysis, eliminating the need to use complex programming methods to run inference.
To use ML with Athena, you define an ML with Athena function with the USING FUNCTION clause.
Train a RandomCutForest Model to detect anomalies and deploy the model to a SageMaker endpoint that the application or Athena can call
As you can see from the results, With each data point, Random Cut Forest algorithm associates an anomaly score. Low score values indicate that the data point is considered “normal.” High values indicate the presence of an anomaly in the data. The definitions of “low” and “high” depend on the application but common practice suggests that scores beyond three standard deviations from the mean score are considered anomalous.