Traditional relational databases are still very prevalent today. Often however, we need to search or slice and dice the data many different ways and RDBMS's are not great at that. As many do, will be using ElasticSearch to provide this functionality. Today we will explore how AWS Lambda can help us index our relational data into ElasticSearch with minimal effort and cost. We will be using Amazon Aurora (MySQL), Amazon Lambda, and ElasticSearch.
We will setup database triggers to fire a lambda function when an INSERT, UPDATE or DELETE occurs in Aurora. The lambda function will then be responsible for updating the ElasticSearch index.
The following tools and accounts are required to complete these instructions.
- Install .NET Core 1.x
- Install AWS CLI
- Sign-up for an AWS account
- Sign-up for an Amazon developer account
- Python 2.7
- pymysql:
pip install pymysql
- MySQL client (for example: Mysql Workbench).
The following steps will walk you through the set-up of an Aurora Cluster (MySQL), an ElasticSearch cluster and the provided lambda function that will do the indexing.
The project uses by default the lambdasharp
profile. Follow these steps to setup a new profile if need be.
- Create a
lambdasharp
profile:aws configure --profile lambdasharp
- Configure the profile with the AWS credentials and region you want to use
The LambdaSharp-IndexerFunction
lambda function requires an IAM role. You can create the LambdaSharp-IndexerFunction
role via the AWS Console or use the executing AWS CLI commands.
aws iam create-role --profile lambdasharp --role-name LambdaSharp-IndexerFunction --assume-role-policy-document file://assets/lambda-role-policy.json
aws iam attach-role-policy --profile lambdasharp --role-name LambdaSharp-IndexerFunction --policy-arn arn:aws:iam::aws:policy/AWSLambdaFullAccess
The Aurora cluster will need permissions to invoke the lambda function. You can create the LambdaSharp-RdsRole
role via the AWS Console or use the executing AWS CLI commands.
aws iam create-role --profile lambdasharp --role-name LambdaSharp-RdsRole --assume-role-policy-document file://assets/rds-role-policy.json
aws iam attach-role-policy --profile lambdasharp --role-name LambdaSharp-RdsRole --policy-arn arn:aws:iam::aws:policy/AWSLambdaFullAccess
- Log into your AWS account and go into the Aurora/RDS service
- Select
Launch Aurora (MySQL)
- Select the smallest cluster possible, and disable
Multi-AZ Deployment
unless you have some cash to burn. - Click
Next Step
- Set
Publically Accessible
toyes
- Set
lambdasharp
as the database name - Click
Launch DB Instance
Once your cluster has finished loading, log into it using a MySQL client, I recommend Mysql Workbench.
- Create the following table
CREATE TABLE heroes (
id int not null auto_increment primary key,
name varchar(255),
urlslug varchar(4096),
identity varchar(255),
alignment varchar(255),
eye_color varchar(255),
hair_color varchar(255),
sex varchar(255),
gsm varchar(255),
alive varchar(255),
appearances int,
first_appearance varchar(10),
year int,
lon double,
lat double
);
In order for Aurora to be able to invoke a Lambda function we need to assign it the LambdaSharp-RdsRole
role.
- In the RDS console, select
Clusters
- Select your cluster, and select
Manage IAM Roles
- Add the
LambdaSharp-RdsRole
role - Restart your cluster for the changes to take effect
- Log into your AWS account and go into the ElasticSearch service
- Click
Create a new domain
- Give it a unique name such as
lambdasharp-index-team1
- Use the lastest available version
- Click
Next
- Select the smallest allowable cluster
- Click
Next
- For
Set the domain access policy to
selectAllow open access to the domain
. (Clearly never do this in production)
- Navigate into the lambda function folder:
cd TriggerFunction/src/TriggerFunction/
- Run:
dotnet restore
- Edit
aws-lambda-tools-defaults.json
and make sure everything is set up correctly - Run:
dotnet lambda deploy-function
You are provided with a data set that contains super heroes from the Marvel universe in a file called marvel_data.csv
. Modified from [original](It contains the following header). lat
and long
columns were added that represent the location of the super hero on earth.
Let's insert 20
rows of data into MySQL:
python insert_heroes.py marvel_data.csv MYSQL_HOST MYSQL_USER MYSQL_PASSWORD lambdasharp 20
Verify that your database now has some data:
SELECT * FROM heroes
Set up a MySQL trigger to invoke our lambda function when a new record is inserted. Give it all the available data about the hero so that we can index it later.
- Create an index in ElasticSearch called
heroes
with the following schema:
PUT heroes
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"hero" : {
"properties" : {
"name" : { "type" : "text" },
"urlslug" : { "type" : "text" },
"identity" : { "type" : "text" },
"alignment" : { "type" : "text" },
"eye_color" : { "type" : "text" },
"hair_color" : { "type" : "text" },
"sex" : { "type" : "text" },
"gsm" : { "type" : "text" },
"alive" : { "type" : "text" },
"appearances" : { "type" : "integer" },
"first_appearance" : { "type" : "text" },
"year" : { "type" : "integer" },
"location": { "type": "geo_point" }
}
}
}
}
NOTE: The ElasticSearch service comes with Kibana which provides a convenient interface for data exploration as well as making API calls (for creating the index for example)
- Extend the lambda function to index the heroes into our new index.
Keep the index synchronized when updates or deletions happen in MySQL.
Using Kibana plot the location of all the super heroes in the world. Where are they located? (Hint: it's ComicCon)
- Oscar Cornejo for helping envision and prototype this hackathon.
- Erik Birkfeld for organizing.
- MindTouch for hosting.
- ArkusNexus for sponsoring & participating!
- Copyright (c) 2017 Yuri Gorokhov, Oscar Cornejo
- MIT License