Skip to content

jhonnymoreira/sheets-server

Repository files navigation

Sheets Server

Server infrastructure for Sheets.

Table of Contents


Introduction

The Sheets Server is a serverless application that provides a REST API to manage spreadsheets and their contacts. The infrastructure relies on AWS services such as S3, Lambda, SQS, and API Gateway. The application is built using TypeScript and deployed using AWS CDK.

The main objective of the Sheets Server is to be a BFF (Backend-for-Frontend) for the Sheets App.

All the endpoints are protected by a custom authorizer on the API Gateway that verifies the user's JWT token generated by Clerk on client-side.


Decision process

Keep in mind that the decisions were made based on a low-cost solution that is easy to maintain and scale, while still providing the needs to meet the requirements of the challenge (500k contacts per spreadsheet).

File upload

The file upload relies on the AWS S3 Presigned URLs feature to generate a signed URL that can be used to upload the file to S3.

The URL expiration time is 10 minutes, which is enough to avoid the token expiration when the user network is slow and needs to upload a large file (we only support files up to 200 MB).

To make it available to the sheets-app, there is an endpoint on API Gateway that generates the signed URL to the user (POST /get-upload-url).

Spreadsheet processing

The spreadsheet processing has the following requirements:

  • 1 AWS S3 bucket to store the processed spreadsheets.
  • 4 AWS SQS queues:
    • AWS SQS Standard with a dead letter queue, to store the S3 Events emmited by the S3 bucket upon object creation.
    • AWS SQS FIFO with a dead letter queue, to store the processed contacts and guarantee that duplicates are not sent to the consumer.

      FIFO (First-In-First-Out) queues have all the capabilities of the standard queues, but are designed to enhance messaging between applications when the order of operations and events is critical, or where duplicates can't be tolerated.

  • 2 AWS Lambda functions:
    • One to consume the S3 Events Queue and process the spreadsheets (validate, remove duplicates, split into chunks, and produce events to the processed contacts queue).
    • One to consume the processed contacts queue and write the data to the database. This lambda will be concurrent and has the max concurrency set to 10.

API Gateway

The API Gateway is also AWS-based and has the following requirements:

  • 4 AWS Lambda functions:
    • One to act as the custom authorizer to verify the user's JWT token generated by Clerk on client-side.
    • Three to handle the endpoints:
      • GET /spreadsheets: Get a list of all spreadsheets from a given user.
      • GET /spreadsheets/{id}: Get the spreadsheet contacts by the spreadsheet id matching for a given user.
      • POST /get-upload-url: Get a signed URL to upload a spreadsheet to S3 by providing a displayName and fileSize.
  • CORS configuration to allow the sheets-app to make requests to the endpoints.

Database

The database is a PostgreSQL service-based instance on Supabase. The free tier is enough to handle the requests of the sheets-app, but the paid tier is recommended to increase the performance, since the free tier has only 2 Core CPUs and 1 GB of RAM.

To ensure the interaction with the database has some type-safety, the database is managed by Drizzle ORM using Postgres.js as the PostgreSQL driver.

The database has the following schema:

Database schema

The extra table contacts_count_by_spreadsheets is used to store the count of contacts per spreadsheet, which helps to avoid long-duration queries when paginating the spreadsheets contacts.

Authentication

The authentication is done by Clerk on client-side. The main decision was to experiment with the service and see how it works, but it is not a requirement to use it.


Workflow

The workflow is divided into the following steps:

  1. The user fills the upload form in sheets-app
  2. When submitting the form, the it is generated a signed URL to upload the file to S3
  3. The signed URL is sent back to the user with some metadata to ensure the file is correctly processed
  4. When the user finishes the file upload, the S3 bucket responsible for storing the files emits an event to the queue (spreadsheets-to-process)
  5. The queue is processed by a consumer lambda (process-spreadsheets) that validates the file iterating over the rows, validating andremoving the duplicated rows, and then splitting the file into chunks of 500 rows to send to another queue (processes-contacts)
  6. The lambda insert-spreadshees-contacts consumes the process-contacts and writes the data to the database. This lambda is concurrent and has the max concurrency set to 10
  7. When the data is written to the database, after each row insertion in contacts table, it triggers a database function that writes to contacts_count_by_spreadsheets table by incrementing the count by spreadsheet_id, avoiding long-duration queries to return the contacts count when the spreadsheet is exhibited to the user.

Process uploaded spreadsheet

Once the spreadsheet is created, it is processed by the server in the following steps:

Process spreadsheet


Available endpoints

API Gateway

Endpoints description

  • POST /get-upload-url: Get a signed URL to upload a spreadsheet to S3 by providing a displayName and fileSize. The endpoint is guarded.
  • GET /spreadsheets: Get a list of all spreadsheets from a given user. The endpoint is guarded.
  • GET /spreadsheets/{id}: Get the spreadsheet contacts by the spreadsheet id matching for a given user. The endpoint is guarded.

Package commands

For drizzle-kit commands, see the drizzle-kit documentation.

  • deploy:development: Deploys the main app stage to the development environment
  • deploy:staging: Deploys the main app stage to the staging environment
  • deploy:production: Deploys the main app stage to the production environment
  • db:check: A wrapper around drizzle-kit check:pg
  • db:introspect: A wrapper around drizzle-kit introspect:pg
  • db:migration:down: A wrapper around drizzle-kit drop
  • db:migration:generate: A wrapper around drizzle-kit generate:pg
  • db:migration:up: A wrapper around drizzle-kit up:pg
  • db:push: A wrapper around drizzle-kit push:pg
  • lint: Runs ESLint on the project
  • prepare: Hook command to setup husky and lint-staged
  • prettify: Prettify the project using Prettier
  • test: Run vitest
  • test:watch: Run vitest with --watch flag
  • test:coverage: Run vitest with --coverage flag

Tools used

About

Server infrastructure for Sheets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published