System Design

Topics

Network Protocols
- Machines in distributed systems communicate with each another over a network.
- Basic understanding of low-level networking
- Protocols (IP, TCP, UDP, DNS, and HTTP(S))
Storage
- Information storage is a fundamental component of every distributed system.
- volatile vs. nonvolatile memory
- the concept of a database
Availability
- Most distributed systems need to be highly available.
- how to measure a system's availability
- how to increase a system's availability
- how to use redundancy
Cache
- Caches store responses to network requests or computationally expensive operation results.
- cache hit and miss terminology
- cache eviction policies
- content delivery networks
Latency and Throughput
- These are key parameters to evaluate the performance of a distributed system.
- latencies of the most common operations
- back-of-the-envelope calculations
Proxies and Load Balancers
- Proxies are intermediary servers used in every system.
- forward and reverse proxies
- use of reverse proxies as load balancers
- load balancer's server selection strategies
Hashing
- Hashing is helpful for the elastic scaling of cache servers and data partitioning.
- consistent hashin
- rendezvous hashing
Relational Databases
- These are structured databases storing data in tabular format and supporting SQL queries.
- indexing
- ACID transactions
- strong vs. eventual consistency
Non Relational Databases
- Key-value databases (i.e., Redis, Zookeeper ) are often used for caching and configuration.
- blob storage (S3)
- time-series databases
- graph database (Neo4j)
- spatial databases and quadtrees
Replication and Sharding
- These are standard techniques to increase availability and performance.
- how to duplicate data on multiple servers to increase redundancy
- how to divide data across multiple servers to increase throughput
Leader Election
- This is how a cluster of servers selects a leader responsible for all the primary operations.
- what is a consensus algorithm
- how Paxos or Raft works
Polling and Streaming
- These are the most common techniques to obtain data from a server.
- how to fetch data at regular intervals (polling)
- how to get a continuous data feed (streaming)
Logging and Monitoring
- Every system needs to measure performance and troubleshoot issues.
- how to collect and log events information
- how to get visibility of the system key metrics
- how to aggregate human readable metrics
Publish-Subscribe
- This is a widely used messaging pattern.
- how pub-sub works
- what are idempotent operations
- popular frameworks like Apache Kafka
Peer-To-Peer networks
- These are machines splitting a workload between them to complete it the fastest way.
- what is a gossip protocol
- use cases of such networks
Rate Limiting
- Limiting the number of requests sent to or received by a system is essential.
- how rate limiting can prevent DDoS attacks
- how to implement rate-limiting strategies (i.e., with Redis)
MapReduce
- This is a popular framework for processing large distributed datasets efficiently and in a fault-tolerant way.
- what is a distributed file system
- popular distributed file systems implementations (i.e., HDFS)
API Design
- the basics of Web API design
- the concept of CRUD operations

Examples

China Train ticket booking system

Courses

Learning platforms

Blogs

Simplify Your Microservices Architecture With a Data API

Books

Videos

System Design for Beginners Course: This course is a detailed introduction to system design for software developers and engineers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

System Design

Topics

Examples

Courses

Learning platforms

Blogs

Books

Videos

Tricks

Files

README.md

Latest commit

History

README.md

File metadata and controls

System Design

Topics

Examples

Courses

Learning platforms

Blogs

Books

Videos

Tricks