Concert Ticket is a web service that allows users to purchase tickets for various categories. The system provides a smooth and efficient booking experience, with real-time updates on seat availability and instant payment processing. Concert Ticket is designed to handle high concurrency and low latency, ensuring a seamless experience for users.
- List Categories: Users can view all available categories.
- Order Creation: Users can place an order for tickets, with the system automatically selecting random available seats to enhance the booking experience.
- Payment Processing: Integration with a payment gateway allows users to complete ticket purchases efficiently.
- High Concurrency Handling: Support thousands of users competing for tickets simultaneously.
- Low Latency: Fast responses for seat selection and ticket purchase.
- Real-Time Updates: Instant feedback on seat availability and purchase confirmation.
We’ve developed a different approach to seat selection that aims for a smoother, more efficient user experience while minimizing contention. We propose several strategies focused on consistency, leveraging a relational database to ensure ACID properties. This will help maintain seat availability integrity and provide a reliable selection process.
Pessimistic locking is a concurrency control mechanism that locks a record to prevent other users from updating it. In
this case, we can use the SELECT ... FOR UPDATE
statement to lock the rows we want to update. The SKIP LOCKED
clause
allows us to skip rows that are already locked by other transactions. This approach ensures that only one user can
select a seat at a time, preventing double bookings.
BEGIN;
SELECT id
FROM tickets
WHERE category_id = ?
AND orders_id IS NULL
FOR UPDATE SKIP LOCKED;
-- Some another operations
UPDATE tickets
SET orders_id = ?
WHERE id = ?;
COMMIT;
While effective, this method can lead to performance issues during high concurrency. When multiple users book seats at the same time, some may experience delays waiting for locks, resulting in longer response times and a poor user experience, especially during peak periods.
Each transaction also consumes a connection from the pool, necessitating a large pool size to manage simultaneous transactions. If the pool is inadequate, users may have to wait for a connection, causing further delays. Additionally, third-party payment services add unpredictability to response times, potentially extending transaction durations and increasing the likelihood of abandoned bookings.
Optimistic locking is a concurrency control mechanism that allows multiple users to read and write to the same record at the same time. When a user updates a record, the system checks if the record has been modified by another user since it was last read. If the record has been updated, the system rejects the changes and prompts the user to retry the operation.
BEGIN;
SELECT id
FROM tickets
WHERE category_id = ?
AND orders_id IS NULL;
-- Some another operations
UPDATE tickets
SET orders_id = ?
WHERE id = ?
AND orders_id IS NULL;
-- If no rows are updated, rollback transaction
COMMIT;
Optimistic locking is more scalable than pessimistic locking, as it allows multiple users to access the same data without blocking each other. However, it requires additional logic to handle conflicts when multiple users attempt to book the same seat simultaneously. This can lead to a very higher rate of failed transactions and user frustration, as users may need to reselect seats or restart the booking process.
A quantity counter approach can be used to track the number of available seats in each category. When a user selects a category, the system decrements the counter by the number of seats they wish to book. If the counter reaches zero, the system displays an error message indicating that no seats are available.
BEGIN;
UPDATE categories
SET quantity = quantity - 1
WHERE id = ?;
-- Some another operations
-- If quantity is 0, rollback transaction
COMMIT;
This approach is simple and efficient, as it does not require locking or conflict resolution. However, behind the scenes, update operations are row-level locks, which can lead to performance issues during high concurrency. If multiple users attempt to book the last seat in a category simultaneously, the system may display an error message to some users, even if seats are still available.
We can replace SQL with a Cache to improve performance and scalability. By storing seat availability in a cache, we can prevent row-level locks and reduce the likelihood of conflicts. In background, we can periodically sync the cache with the database to ensure data consistency. Using this approach, we can provide a seamless booking experience for users, with fast response times and lock-free.
DECRBY category:<id>:quantity 1
-- If quantity is 0 or error, rollback
-- INCRBY category:<id>:quantity 1
To list all available categories, we can use a local cache (variable data) to store category data. The data is periodically refreshed to ensure it remains up-to-date. This approach minimizes the number of queries to the cache server (ex. redis), removing network latency.
After a successful payment, the payment gateway sends a notification to the system, which updates the order status in the database. This process is asynchronous, allowing the system to handle a large number of payment notifications without blocking the main thread.
To synchronize the quantity of each category between the cache and the database, we can use a publish-subscribe pattern. When a user books a ticket, the system publishes a message to a message queue, which triggers a background worker to sync the cache with the database. This approach ensures that the cache remains up-to-date with the latest data, preventing inconsistencies between the two data sources.
To prevent worker from updating the same category simultaneously, we can use batch processing to process a set of events and group them by category. This approach minimizes the number of database queries and reduces contention.
-- Before
UPDATE categories
SET quantity = quantity - 1
WHERE id = ?;
-- After
UPDATE categories
SET quantity = quantity - CASE id
WHEN ? THEN ?
WHEN ? THEN ?
END
WHERE id IN (?, ?);
When a user exceeds the payment time limit, the system automatically cancels the order and releases the seats. To manage this process, we employ a background worker that periodically checks for and cancels orders that have surpassed the time limit, effectively returning the seats to the available pool. Additionally, we utilize Common Table Expressions (CTEs) to update the orders in a single, efficient query.
WITH selected_orders AS (SELECT id
FROM orders
WHERE status = 'pending'
AND expired_at < NOW()
ORDER BY expired_at
LIMIT ?),
updated_orders AS (
UPDATE orders
SET status = 'cancelled'
WHERE id IN (SELECT id FROM selected_orders) AND status = 'pending'
RETURNING id, category_id, name, email)
SELECT id, category_id, name, email
FROM updated_orders;
Here is the final database schema from the design:
For observability, we have Log and Tracing. Logs are generated using slog
library and printed to the console then will
scrape by Promtail
and store in Loki
. Tracing is implemented using OpenTelemetry
then will be stored in Tempo
.
We also have Grafana
to visualize the data. We also connect Trace and Log to provide a better understanding of the
system.
- Web Service: GoLang
- Database: PostgreSQL
- Cache: Redis
- Message Queue: NATS JetStream
- Deployment: Docker
- Observability: OpenTelemetry, Grafana, Loki, Tempo, Promtail
- Load Test: K6
- Docker >= 26
- Docker Compose >= 2
- Make >= 4
For development:
- Clone the repository
- Run
docker compose up -d
- Run sql script to migrate and seed in
infra/sql/
directory - Run
go run main.go
- Connect to
http://localhost:8000
For production:
- Run
make build
- Run
make up
- Connect to
http://localhost
For load test:
- Run
cd loadtest
- Run
k6 run main.js
App VM: AWS EC2 C7G.4xlarge 16 Core CPU Load Test & Monitoring VM: AWS EC2 T3.2xlarge 8 Core CPU 32GB RAM
Configuration:
- 8500 Concurrent Users
- Duration 45s
Result:
- 0% Error Rate
- Generate ~9200 request per second
- ~415ms Average Response Time
- ~762ms 90th Percentile Response Time
- ~30% of Ticket Failed to Book (Due to high competition)