This is the source code for the tutorial: Create a serverless scraping architecture, with Scaleway Messaging and Queuing SQS, Serverless Functions and Managed Database.
In this tutorial we show how to set up a simple application which reads Hacker News and processes the articles it finds there asynchronously, using Scaleway serverless products.
This example assumes you are familiar with how serverless functions work. If needed, you can check Scaleway official documentation
This example is written using Python and Terraform, and assumes you have set up authentication for the Terraform provider.
The architecture deployed in this tutorial consists of two functions, two triggers, a SQS queue, and a RDB instance. The producer function, activated by a recurrent cron trigger, scrapes HackerNews for articles published in the last 15 minutes and pushes the title and URL of the articles to an SQS queue created with Scaleway Messaging and Queuing. The consumer function, triggered by each new message on the SQS queue, consumes messages published to the queue, scrapes some data from the linked article, and then writes the data into a Scaleway Managed Database.
Once you have cloned this repository, you just need to deploy using Terraform.
terraform init
terraform apply
Everything is already up and running! You can check correct execution by using the Scaleway cockpit, and by connecting to your RDB instance to see results.
psql -h $(terraform output -raw db_ip) --port $(terraform output -raw db_port) -d hn-database -U worker