|
1 |
| -# Fuzzy Movies |
2 |
| -## Lambda, Kinesis, Firehose, ElasticSearch, S3 |
3 |
| - |
| 1 | +# Fuzzy Movie Search - Search application with Lambda, Kinesis, Firehose, ElasticSearch, S3 |
4 | 2 |
|
5 |
| -This Hackathon project is an AWS app consisting of: |
| 3 | + |
| 4 | +| Key | Value | |
| 5 | +| ------------ | ------------------------------------------------------------------------------------- | |
| 6 | +| Environment | <img src="https://img.shields.io/badge/LocalStack-deploys-4D29B4.svg?logo="> | |
| 7 | +| Services | Lambda, Kinesis, Firehose, ElasticSearch, S3 | |
| 8 | +| Integrations | Terraform, AWS CLI | |
| 9 | +| Categories | Serverless; Event-Driven architecture | |
| 10 | +| Level | Intermediate | |
| 11 | +| GitHub | [Repository link](https://github.com/localstack/fuzzy-movie-search) | |
| 12 | + |
| 13 | +## Introduction |
| 14 | +This Fuzzy Search application demonstrates how to set up an S3-hosted website that enables you to fuzzy-search a movie database. The sample application implements the following integration among the various AWS services: |
6 | 15 | - A data ingestion pipeline which allows adding movie data to an ElasticSearch index via:
|
7 |
| - 1. An AWS Lambda function, explosed via a fuction URL. |
8 |
| - 2. The Lambda function sends the JSON payload to a Kinesis Data Stream. |
9 |
| - 3. A Kinesis Firehose Delivery Stream forwards the data to an ElasticSearch domain. |
| 16 | + - An AWS Lambda function, explosed via a fuction URL. |
| 17 | + - The Lambda function sends the JSON payload to a Kinesis Data Stream. |
| 18 | + - A Kinesis Firehose Delivery Stream forwards the data to an ElasticSearch domain. |
10 | 19 | - A frontend / website which:
|
11 | 20 | - Has a simple search interface to search for movies in the database.
|
12 |
| - - The HTML page uses a vanilla JS script to query data using a second Lambda function. |
| 21 | + - The HTML page uses a plain JS script to query data using a second Lambda function. |
13 | 22 | - This Lambda function performs a fuzzy query on the movie index in the ElasticSearch cluster.
|
14 | 23 |
|
15 |
| -## System Overview |
16 |
| - |
17 |
| - |
18 |
| -## Setup |
19 |
| -1. Clone this repo and `cd` into its working directory |
20 |
| -2. Install the following tools: |
21 |
| - - [Terraform](https://www.terraform.io/downloads) (v1.4.5) |
22 |
| - - [tflocal](https://github.com/localstack/terraform-local) |
23 |
| - - [awslocal](https://github.com/localstack/awscli-local) |
24 |
| -3. Start LocalStack in the foreground so you can watch the logs: |
25 |
| - ``` |
26 |
| - docker compose up |
27 |
| - ``` |
28 |
| -4. Open another terminal window and `cd` into the same working directory |
29 |
| -5. Create the resource and trigger the invocation of the lambda: |
30 |
| - ``` |
31 |
| - ./run.sh |
32 |
| - ``` |
33 |
| - |
34 |
| -# TODO: |
35 |
| -- This sample does not yet run on AWS |
36 |
| - - Firehose -> ElasticSearch |
37 |
| - - Records are not properly delivered to ElasticSearch yet |
38 |
| - - Search Lambda -> ElasticSearch |
39 |
| - - Lambda needs to sign the HTTP requests to ElasticSearch |
40 |
| -- Simplify the S3 website URL in LocalStack |
41 |
| - - We need to use http://movie-search.s3.amazonaws.com:4566/index.html instead of the generated output: http://movie-search.s3-website-eu-west-1.amazonaws.com/ |
42 |
| - - It works with http://movie-search.s3-website.localhost.localstack.cloud/ |
43 |
| -- HTTPS? |
44 |
| - - Due to the function URLs having no proper certificate, we can only use the http version! |
45 |
| - - http://movie-search.s3-website.localhost.localstack.cloud:4566/ |
| 24 | +## Architecture Diagram |
| 25 | + |
| 26 | +The following diagram shows the architecture that this sample application builds and deploys: |
| 27 | + |
| 28 | + |
| 29 | + |
| 30 | +[S3 Website](https://docs.localstack.cloud/tutorials/s3-static-website-terraform/) that holds the website. |
| 31 | +[Lambda] (https://docs.localstack.cloud/user-guide/aws/lambda/) for feeding the Kinesis stream and performing the fuzzy-search. |
| 32 | +[Kinesis](https://docs.localstack.cloud/user-guide/aws/kinesis/) for forwarding the data into Elasticsearch. |
| 33 | +[Firehose](https://docs.localstack.cloud/user-guide/aws/kinesis-firehose/) for forwarding the data into Elasticsearch. |
| 34 | +[Elasticsearch](https://docs.localstack.cloud/user-guide/aws/elasticsearch/) which actually holds the data. |
| 35 | + |
| 36 | +## Prerequisites |
| 37 | +- LocalStack Pro with the [`localstack` CLI](https://docs.localstack.cloud/getting-started/installation/#localstack-cli). |
| 38 | +- [Terraform](https://docs.localstack.cloud/user-guide/integrations/terraform/) with the [`tflocal`](https://github.com/localstack/terraform-local) installed. |
| 39 | +- [AWS CLI](https://docs.localstack.cloud/user-guide/integrations/aws-cli/) with the [`awslocal` wrapper](https://docs.localstack.cloud/user-guide/integrations/aws-cli/#localstack-aws-cli-awslocal). |
| 40 | + |
| 41 | +Start LocalStack Pro with the `LOCALSTACK_API_KEY` pre-configured: |
| 42 | + |
| 43 | +```shell |
| 44 | +export LOCALSTACK_API_KEY=<your-api-key> |
| 45 | +docker compose up -d |
| 46 | +``` |
| 47 | + |
| 48 | +## Instructions |
| 49 | +You can build and deploy the sample application on LocalStack by running `./run.sh`. |
| 50 | +Here are instructions to deploy and test it manually step-by-step. |
| 51 | + |
| 52 | +### Build the application |
| 53 | + |
| 54 | +To build the Terraform application, run the following commands: |
| 55 | + |
| 56 | +```bash |
| 57 | +terraform init; terraform plan; terraform apply --auto-approve |
| 58 | +``` |
| 59 | +This will create all ressources specified in `main.tf`. |
| 60 | +This can take can take a couple of minutes. |
| 61 | +Once it is done, you will be able to save the following values into variables by executing these commands |
| 62 | + |
| 63 | +```bash |
| 64 | +ingest_function_url=$(terraform output --raw ingest_lambda_url) |
| 65 | +elasticsearch_endpoint=$(terraform output --raw elasticsearch_endpoint) |
| 66 | +``` |
| 67 | + |
| 68 | +### Download the dataset |
| 69 | + |
| 70 | +The dataset we will use for this application is a selection of movies and their typical data such as name, author, genre, etc. |
| 71 | +Execute the following commands to make it available. |
| 72 | + |
| 73 | +```bash |
| 74 | +temp_dir=$(mktemp --directory) |
| 75 | +movie_dataset_url="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/samples/sample-movies.zip" |
| 76 | +curl -L $movie_dataset_url > $temp_dir/sample-movies.zip |
| 77 | +unzip $temp_dir/sample-movies.zip -d $temp_dir/ |
| 78 | +``` |
| 79 | + |
| 80 | +### Pre-processing the data |
| 81 | + |
| 82 | +For the data to properly work for our streaming use case, we need to remove the bulk insert instruction. |
| 83 | + |
| 84 | +```bash |
| 85 | +grep -v '^{ "index"' $temp_dir/sample-movies.bulk > $temp_dir/sample-movies-processed.bulk |
| 86 | +mv $temp_dir/sample-movies-processed.bulk $temp_dir/sample-movies.bulk |
| 87 | +``` |
| 88 | + |
| 89 | +### Populating the database |
| 90 | + |
| 91 | +We know populate the database with the actual entries via our lambda function. |
| 92 | +Execute the following code to insert the entries line by line. |
| 93 | +It will take quite some time to finish |
| 94 | + |
| 95 | +```bash |
| 96 | +cat $temp_dir/sample-movies.bulk | while read line |
| 97 | +do |
| 98 | + echo -n "." |
| 99 | + echo $line | curl -s -X POST $ingest_function_url \ |
| 100 | + -H 'Content-Type: application/json' \ |
| 101 | + -d @- > /dev/null |
| 102 | +done |
| 103 | +``` |
| 104 | + |
| 105 | +### Querying the database |
| 106 | + |
| 107 | +Now you can access the website with its entries under http://movie-search.s3-website.localhost.localstack.cloud:4566/ . |
| 108 | +If e.g. you search for "Quentis", a misspelling of "Quentin", you should see entries that relate the director "Quentin Tarantino", similar to the following screenshot. |
| 109 | + |
| 110 | + |
| 111 | + |
| 112 | + |
| 113 | +## Known limitations |
| 114 | + |
| 115 | +The localstack logs sometimes show error message in regards to the firehose propagation. |
| 116 | +While this might reduce the size of the database to some degree, it is still be sufficient for demonstration purposes. |
| 117 | + |
| 118 | + |
| 119 | +## Contributing |
| 120 | + |
| 121 | +We appreciate your interest in contributing to our project and are always looking for new ways to improve the developer experience. We welcome feedback, bug reports, and even feature ideas from the community. |
| 122 | +Please refer to the [contributing file](CONTRIBUTING.md) for more details on how to get started. |
0 commit comments