Skip to content

Latest commit

 

History

History
37 lines (23 loc) · 1.9 KB

File metadata and controls

37 lines (23 loc) · 1.9 KB

Create a serverless scraping architecture

This is the source code for the tutorial: Create a serverless scraping architecture, with Scaleway Messaging and Queuing SQS, Serverless Functions and Managed Database.

In this tutorial we show how to set up a simple application which reads Hacker News and processes the articles it finds there asynchronously, using Scaleway serverless products.

Requirements

This example assumes you are familiar with how serverless functions work. If needed, you can check Scaleway official documentation

This example is written using Python and Terraform, and assumes you have set up authentication for the Terraform provider.

Context

The architecture deployed in this tutorial consists of two functions, two triggers, a SQS queue, and a RDB instance. The producer function, activated by a recurrent cron trigger, scrapes HackerNews for articles published in the last 15 minutes and pushes the title and URL of the articles to an SQS queue created with Scaleway Messaging and Queuing. The consumer function, triggered by each new message on the SQS queue, consumes messages published to the queue, scrapes some data from the linked article, and then writes the data into a Scaleway Managed Database.

Setup

Once you have cloned this repository, you just need to deploy using Terraform.

terraform init
terraform apply

Running

Everything is already up and running! You can check correct execution by using the Scaleway cockpit, and by connecting to your RDB instance to see results.

psql -h $(terraform output -raw db_ip) --port $(terraform output -raw db_port) -d hn-database -U worker