Server infrastructure for Sheets.
- Introduction
- Decision process
- Workflow
- Available endpoints
- Package commands
- Tools used
- Room for improvements
The Sheets Server is a serverless application that provides a REST API to manage spreadsheets and their contacts. The infrastructure relies on AWS services such as S3, Lambda, SQS, and API Gateway. The application is built using TypeScript and deployed using AWS CDK.
The main objective of the Sheets Server is to be a BFF (Backend-for-Frontend) for the Sheets App.
All the endpoints are protected by a custom authorizer on the API Gateway that verifies the user's JWT token generated by Clerk on client-side.
Keep in mind that the decisions were made based on a low-cost solution that is easy to maintain and scale, while still providing the needs to meet the requirements of the challenge (500k contacts per spreadsheet).
The file upload relies on the AWS S3 Presigned URLs feature to generate a signed URL that can be used to upload the file to S3.
The URL expiration time is 10 minutes, which is enough to avoid the token expiration when the user network is slow and needs to upload a large file (we only support files up to 200 MB).
To make it available to the sheets-app
, there is an endpoint on API Gateway that generates the signed URL to the user (POST /get-upload-url
).
The spreadsheet processing has the following requirements:
- 1 AWS S3 bucket to store the processed spreadsheets.
- 4 AWS SQS queues:
- AWS SQS Standard with a dead letter queue, to store the S3 Events emmited by the S3 bucket upon object creation.
- AWS SQS FIFO with a dead letter queue, to store the processed contacts and guarantee that duplicates are not sent to the consumer.
FIFO (First-In-First-Out) queues have all the capabilities of the standard queues, but are designed to enhance messaging between applications when the order of operations and events is critical, or where duplicates can't be tolerated.
- 2 AWS Lambda functions:
- One to consume the S3 Events Queue and process the spreadsheets (validate, remove duplicates, split into chunks, and produce events to the processed contacts queue).
- One to consume the processed contacts queue and write the data to the database. This lambda will be concurrent and has the max concurrency set to 10.
The API Gateway is also AWS-based and has the following requirements:
- 4 AWS Lambda functions:
- One to act as the custom authorizer to verify the user's JWT token generated by Clerk on client-side.
- Three to handle the endpoints:
GET /spreadsheets
: Get a list of all spreadsheets from a given user.GET /spreadsheets/{id}
: Get the spreadsheet contacts by the spreadsheetid
matching for a given user.POST /get-upload-url
: Get a signed URL to upload a spreadsheet to S3 by providing adisplayName
andfileSize
.
- CORS configuration to allow the
sheets-app
to make requests to the endpoints.
The database is a PostgreSQL service-based instance on Supabase. The free tier is enough to handle the requests of the sheets-app
, but the paid tier is recommended to increase the performance, since the free tier has only 2 Core CPUs and 1 GB of RAM.
To ensure the interaction with the database has some type-safety, the database is managed by Drizzle ORM using Postgres.js as the PostgreSQL driver.
The database has the following schema:
The extra table contacts_count_by_spreadsheets
is used to store the count of contacts per spreadsheet, which helps to avoid long-duration queries when paginating the spreadsheets contacts.
The authentication is done by Clerk on client-side. The main decision was to experiment with the service and see how it works, but it is not a requirement to use it.
The workflow is divided into the following steps:
- The user fills the upload form in
sheets-app
- When submitting the form, the it is generated a signed URL to upload the file to S3
- The signed URL is sent back to the user with some metadata to ensure the file is correctly processed
- When the user finishes the file upload, the S3 bucket responsible for storing the files emits an event to the queue (
spreadsheets-to-process
) - The queue is processed by a consumer lambda (
process-spreadsheets
) that validates the file iterating over the rows, validating andremoving the duplicated rows, and then splitting the file into chunks of 500 rows to send to another queue (processes-contacts
) - The lambda
insert-spreadshees-contacts
consumes theprocess-contacts
and writes the data to the database. This lambda is concurrent and has the max concurrency set to 10 - When the data is written to the database, after each row insertion in
contacts
table, it triggers a database function that writes tocontacts_count_by_spreadsheets
table by incrementing thecount
byspreadsheet_id
, avoiding long-duration queries to return the contacts count when the spreadsheet is exhibited to the user.
Once the spreadsheet is created, it is processed by the server in the following steps:
POST /get-upload-url
: Get a signed URL to upload a spreadsheet to S3 by providing adisplayName
andfileSize
. The endpoint is guarded.GET /spreadsheets
: Get a list of all spreadsheets from a given user. The endpoint is guarded.GET /spreadsheets/{id}
: Get the spreadsheet contacts by the spreadsheetid
matching for a given user. The endpoint is guarded.
For drizzle-kit
commands, see the drizzle-kit documentation.
deploy:development
: Deploys the main app stage to the development environmentdeploy:staging
: Deploys the main app stage to the staging environmentdeploy:production
: Deploys the main app stage to the production environmentdb:check
: A wrapper arounddrizzle-kit check:pg
db:introspect
: A wrapper arounddrizzle-kit introspect:pg
db:migration:down
: A wrapper arounddrizzle-kit drop
db:migration:generate
: A wrapper arounddrizzle-kit generate:pg
db:migration:up
: A wrapper arounddrizzle-kit up:pg
db:push
: A wrapper arounddrizzle-kit push:pg
lint
: Runs ESLint on the projectprepare
: Hook command to setuphusky
andlint-staged
prettify
: Prettify the project using Prettiertest
: Runvitest
test:watch
: Runvitest
with--watch
flagtest:coverage
: Runvitest
with--coverage
flag
- Language: JavaScript with TypeScript and running on Node.js via AWS Lambda
- Infrastructure as Code: AWS CDK
- Database: PostgreSQL via Supabase with Drizzle ORM using Postgres.js as the PostgreSQL driver.
- Queues: AWS SQS (FIFO and Standard)
- File storage: AWS S3
- Authentication: Clerk to handle user authentication and authorization, verified on API Gateway authorizer with jose
- Testing: Vitest
- Code quality tools: ESLint (with flat config) and Prettier (with few plugins to inforce a consistent code style).
- Validation: zod for type-safe validation.