Link: https://newsletter.systemdesigncodex.com/p/cap-theorem
- Essential Elements: Consistency, Availability, Partition Tolerance.
- Partition Tolerance: Must-have due to inherent unreliability in communication networks.
- Choice between Consistency and Availability:
- Consistency: All nodes see the same data at the same time. Requires a trade-off with availability.
- Availability: Ensures that the system is always operational, but might compromise on having the latest data across all nodes.
Link: https://newsletter.systemdesigncodex.com/p/the-inevitable-law-governing-software-design
- Basic Principle: The structure of a software system reflects the communication structure of its creating organization.
- Example: In a company with inventory, invoicing, and shipping departments, the software will likely have separate systems for each, mirroring these divisions.
- Implications:
- Software integration quality depends on how well these departments communicate.
- Better communication leads to more effective and integrated software modules.
- Strategies for Addressing Conway’s Law:
- Acknowledge It: Recognize its impact on software design.
- Structure Teams Effectively: Place teams working on similar systems close to each other for better communication.
- Avoid Dividing by Technology: Instead of splitting teams by tech layers (front end, back end), focus on business features for smoother collaboration.
- Use Architectural Insights: Align team structures with desired software architecture, understanding that organizational decisions influence software design.
Link: https://newsletter.systemdesigncodex.com/p/the-ingredients-to-delicious-software
-
Scalability:
- Ability to handle increased workload efficiently.
- Important to identify the point where scaling becomes cost-ineffective.
-
Latency & Throughput:
- Latency: Time taken to respond to a request (e.g., time to serve a cheese sandwich).
- Throughput: Number of requests handled in a given time (e.g., serving multiple customers).
-
Availability and Consistency:
- Availability: Ability to operate despite issues (e.g., with one cook absent).
- Measured in 'nines' (e.g., 99.9% availability).
- Consistency: Synchronization of information across different parts of the system (e.g., order copies being in sync).
Link: https://blog.bytebytego.com/p/what-happens-when-you-type-a-url
When you type a URL into your browser:
- URL Parsing: The browser identifies the HTTP protocol, domain, path, and resource.
- DNS Lookup: It searches for the IP address of the domain, checking various caches.
- TCP Connection: Establishes a connection with the server.
- HTTP Request: Sends a request for the specific resource.
- Server Response: The server sends back the requested content.
- Rendering: The browser displays the webpage.
Link: https://blog.bytebytego.com/p/how-does-cdn-work
-
Domain Name Lookup:
- Bob enters
www.myshop.com
in his browser. - The browser checks the local DNS cache for the domain.
- Bob enters
-
DNS Resolver:
- If not in the local cache, the browser contacts the DNS resolver (usually via the ISP).
-
Recursive Domain Resolution:
- The DNS resolver performs recursive resolution for
www.myshop.com
.
- The DNS resolver performs recursive resolution for
-
CDN Integration:
- Instead of pointing directly to the London server, the authoritative name server redirects to a CDN domain (
www.myshop.cdn.com
).
- Instead of pointing directly to the London server, the authoritative name server redirects to a CDN domain (
-
Load Balancer Query:
- The DNS resolver queries the CDN load balancer domain (
www.myshop.lb.com
).
- The DNS resolver queries the CDN load balancer domain (
-
Optimal Server Selection:
- The CDN load balancer selects the best CDN edge server based on factors like the user’s location and server load.
-
Content Delivery:
- The browser connects to the chosen CDN edge server to load content.
- The content includes static (e.g., images, videos) and dynamic elements.
- If content is not on the edge server, it's fetched from higher-level CDN servers or the origin server in London.
-
CDN Network:
- This process is part of a geographically distributed CDN network for efficient content delivery.
Link: https://newsletter.francofernando.com/p/time-complexity
-
Purpose: Time complexity evaluates how an algorithm's performance scales with the size of the input data.
-
Types of Complexity:
- Worst-Case Complexity: Maximum number of steps for any input of size
n
. Most commonly used as it provides guarantees about the algorithm's upper limit. - Best-Case Complexity: Minimum number of steps for any input of size
n
. - Average-Case Complexity: Average number of steps over all possible instances of input size
n
.
- Worst-Case Complexity: Maximum number of steps for any input of size
-
Big Oh Notation: Simplifies the expression of an algorithm's worst-case complexity by focusing on growth rates rather than precise step counts.
-
Common Complexity Classes:
- Constant - O(1): Time is independent of input size (e.g., adding two numbers).
- Logarithmic - O(log n): Each step cuts the problem size in half (e.g., binary search).
- Linear - O(n): Time grows linearly with input size (e.g., finding max in an array).
- Superlinear - O(n log n): Combines linear and logarithmic growth (e.g., Quicksort, Mergesort).
- Quadratic - O(n^2): Time grows with the square of input size (e.g., insertion sort).
- Cubic - O(n^3): Involves triple nested loops (e.g., certain dynamic programming algorithms).
- Exponential - O(c^n): Time doubles with each addition to input size (e.g., enumerating subsets).
- Factorial - O(n!): Time grows with the factorial of input size (e.g., generating permutations).
Link: https://newsletter.systemdesign.one/p/whatsapp-engineering
-
Single Responsibility Principle:
- Focus on core feature: Messaging.
- Avoided feature creep and unnecessary functionalities.
- Prioritized reliability above all.
-
Technology Stack:
- Chose Erlang for server functionalities due to its scalability and support for hot-loading.
- Erlang's efficient threading and context-switching mechanisms contributed to performance.
-
Utilizing Existing Solutions:
- Leveraged open-source solutions like Ejabberd, an Erlang-based messaging server.
- Customized existing solutions to fit specific needs.
- Integrated third-party services for functionalities like push notifications.
-
Cross-Cutting Concerns:
- Emphasized aspects like monitoring and alerting for service health.
- Implemented Continuous Integration and Continuous Delivery for software development.
-
Scalability Strategies:
- Adopted diagonal scaling, combining horizontal and vertical scaling methods.
- Ran servers on FreeBSD, optimized for handling millions of connections.
- Overprovisioned servers for handling traffic spikes and potential failures.
-
Continuous Improvement (Flywheel Effect):
- Regularly measured performance metrics to identify and eliminate bottlenecks.
- Maintained a cycle of continuous feedback and improvement.
-
Focus on Quality:
- Conducted load testing to identify and address single points of failure.
- Used simulated production traffic for realistic testing.
-
Small Team Size:
- Kept the engineering team small (32 engineers) to maintain efficiency and reduce communication overhead.
Link: https://newsletter.systemdesign.one/p/mysql-sharding
-
Vertical Sharding:
- Implementation: Separating tables into different servers (leader-follower model).
- Purpose: Enhances write scalability.
- Challenges: Replication lag, transactional limitations, and potential performance issues for large tables.
-
Horizontal Sharding:
- Reasons for Adoption: Addressing challenges with large tables such as schema changes and error risks.
- Approach: Splitting a logical table into multiple physical tables.
-
Key Decisions in Horizontal Sharding:
- Build vs. Buy: Opted to build their own sharding solution, reusing vertical sharding logic.
- Shard Level: Focused on sharding at the table level due to extensive use of secondary indexes.
- Sharding Method: Chose range-based partitioning, favoring common range queries.
- Metadata Management: Stored shard metadata in Apache Zookeeper.
- Database API: Modified to handle sharding columns and keys, enhancing security against SQL injections.
- Sharding Column Selection: Based on latency sensitivity and query per second (QPS) considerations.
- Cross-Shard Indexes: Used to optimize non-sharding column queries, though with potential performance and consistency trade-offs.
- Number of Shards: Maintained a lower count to reduce latency in non-sharding column queries.
Link: https://newsletter.systemdesigncodex.com/p/making-your-database-highly-available
- Purpose: Ensure continuous database operation even if one server fails.
- Not Backup: Unlike backups, redundancy involves running multiple active database instances.
- Cost of Outage: Can be significantly high, averaging $7,900 per minute.
- Redundancy Patterns:
- Active-Passive: One active server handles requests while others stand by.
- Active-Active: Multiple servers handle requests simultaneously.
- Multi-Active: An extension of Active-Active with more complex setups.
- Goal: Minimize disaster impact by physically separating database components.
- Degrees of Separation:
- Server: Different servers in the same data center.
- Rack: Separate racks within a data center.
- Data-Center: Multiple data centers.
- Availability Zone: Distinct zones within a cloud provider's network.
- Region: Geographically dispersed locations.
Link: https://newsletter.systemdesigncodex.com/p/how-rate-limiting-works
- Concept: Limits the number of requests sent to a server.
- Implementation: A rate limiter is used to control traffic to servers or APIs.
- Limit: Maximum number of requests allowed in a set time frame (e.g., 600 requests per day).
- Window: The duration for the limit, varying from seconds to days.
- Identifier: A unique attribute (like User ID or IP address) to identify request senders.
- Process:
- Count Requests: Track the number of requests from a user or IP.
- Limit Exceeded: If count exceeds the limit, block or restrict further requests.
- Considerations:
- Storage of request counters.
- Rate limiting rules.
- Response strategy for blocked requests.
- Rule change implementation.
- Maintaining application performance.
- Rate Limiter Component: Checks incoming requests against the rules and stored data (number of requests made).
- Rules Engine: Defines the rate limiting rules.
- Cache: Stores rate-limiting data for high throughput and low latency.
- Response Handling:
- Allow request if within limit.
- Block request if over limit, typically with HTTP status code 429.
- Silent Drop: Fool attackers by silently dropping excess requests.
- Cached Rules: Enhance performance with a cache for the rules engine and background updates for rule changes.
Link: https://newsletter.francofernando.com/p/caching
- Purpose: Speeds up data access by storing data temporarily in a fast-access hardware or software layer.
- Cache Hit: Data is found in the cache.
- Cache Miss: Data is not in the cache and must be fetched from its original location.
- Levels: Hardware, OS, front-end, web apps, databases, etc.
- Roles:
- Reducing latency.
- Saving network requests.
- Storing results of resource-intensive operations.
- Avoiding repetitive operations.
- Application Caching: Integrated into app code, checks cache before database access. Examples: Memcached, Redis.
- Database Caching: Built into databases, requires no code changes, optimizes data retrieval.
- Cache Miss Rate: High miss rates can add more latency.
- Stale Data: Ensuring cache data is up-to-date and relevant.
-
Cache Aside (Lazy Loading):
- Direct read from cache. If miss, read from DB and update cache.
- Advantages: Good for read-heavy workloads. Cache only stores necessary data.
- Disadvantages: Can serve stale data. Initial cache misses.
-
Read Through:
- Interact only with cache. Cache manages data fetching from DB.
- Simplifies app code but complicates cache implementation.
-
Write Through:
- Writes data to cache and DB simultaneously.
- Ensures data consistency. Higher write latency.
-
Write Back (Asynchronous Writing):
- Writes data to cache, then asynchronously to DB.
- Lower write latency. Good for write-heavy workloads.
-
Write Around:
- Writes directly to DB, cache only stores read data.
- Good for infrequently read data. Higher read latency for new data.
- Depends on data access patterns.
- Cache-Around: Good for general-purpose, read-intensive applications.
- Write-Heavy Workloads: Write-back approaches are beneficial.
- Infrequent Reads: Write-around strategy.
- Manage Limited Cache Space:
- FIFO: First in, first out.
- LIFO: Last in, first out.
- LRU: Least recently used.
- MRU: Most recently used.
- LFU: Least frequently used.
- RR: Random replacement.
Link: https://newsletter.systemdesigncodex.com/p/database-replication-under-the-hood
- How It Works: The leader logs every SQL write statement (INSERT, UPDATE, DELETE) and forwards these statements to follower nodes.
- Advantages:
- Efficient in network bandwidth, only SQL statements are transferred.
- Portable across different database versions.
- Simpler to implement.
- Limitations:
- Non-deterministic functions (e.g., NOW(), UUID()) yield different values on replicas.
- Transactions involving auto-incrementing columns must be executed in the same order.
- Potential unforeseen effects due to triggers or stored procedures.
- Concept: The WAL, an append-only sequence of all writes, is shared with follower nodes.
- Usage: Common in databases like PostgreSQL.
- Advantage: Creates an exact replica of the leader’s data structures.
- Disadvantage: Tightly coupled to the storage engine, making it less flexible with database version changes and hindering zero-downtime upgrades.
- Functionality: Uses a logical log showing writes in a row format.
- Operation Details:
- Inserts log new values for all columns.
- Deletes log identifiers for deleted rows.
- Updates log identifiers and new values for modified columns.
- Advantage: Decouples from the storage engine, allowing backward compatibility and version flexibility between leader and follower databases.
- The choice depends on the specific requirements of the system, such as:
- Network efficiency.
- Consistency requirements.
- Database version compatibility.
- Statement-based Replication: Best for simple, less concurrent environments.
- WAL Shipping: Suitable for systems where exact replica and data integrity are critical.
- Row-Based Replication: Ideal for environments requiring flexibility and compatibility across different database versions.
Link: https://newsletter.francofernando.com/p/consistent-hashing
- Use Case: Store frequently accessed data in fast, in-memory caches.
- Hashing Role: Ensures identical requests are sent to the same server by hashing request attributes (IP, username, etc.).
- Challenge: Maintaining effective caching when servers are added or removed.
- Purpose: Distribute data across multiple database servers.
- Hashing Function: Data keys are hashed to determine the server where data will be stored.
- Limitation: Similar to caching, adding or removing servers complicates data distribution.
- Goal: Map keys (data identifiers or workload requests) to servers efficiently.
- Desired Properties:
- Balancing: Equal distribution of keys among servers.
- Scalability: Easily adding or removing servers with minimal reconfiguration.
- Lookup Speed: Quickly finding the server for a given key.
- Method: Number servers, use
hash(key) % N
to assign keys to servers. - Drawback: Not scalable. Changing server count (N) requires remapping all keys.
- Concept: Treat hash values as a circular space. Map keys and servers onto this circle.
- Operation: Assign each key to the nearest server on the circle in a clockwise direction.
- Advantages:
- Only a fraction of keys need remapping when adding/removing servers.
- Better scalability.
- Issue: Does not guarantee even key distribution (balancing).
- Strategy: Introduce replicas or virtual nodes for each server on the hash circle.
- Benefits:
- Better balancing due to smaller ranges and more uniform key distribution.
- Faster rebalancing when servers are added or removed.
- Support for server fault tolerance and heterogeneity.
- Implementation: Assign more virtual nodes to more powerful servers for load balancing.
Link: https://newsletter.systemdesigncodex.com/p/why-replication-lag-occurs-in-databases
-
Concept: Replication Lag is the delay between a write operation on the leader node and its replication on follower nodes in a database system.
-
Leader-based Replication Setup:
- Writes are processed by a single node (leader).
- Read queries can be served by any replica (follower).
- Common in systems with more reads than writes.
-
Asynchronous vs. Synchronous Replication:
- Synchronous: All replicas must confirm write operations, causing potential unavailability if a replica is down.
- Asynchronous: Allows distribution of reads across followers, but can lead to outdated reads if a follower lags.
-
How Replication Lag Occurs:
- User A updates data on the leader node.
- Leader sends replication data to followers.
- User B reads from a follower (replica 2) before it's updated, receiving outdated information.
- Replica 2 eventually gets updated.
-
Implications:
- Lag duration varies from fractions of a second to minutes.
- Causes temporary data inconsistencies (eventual consistency).
- Large lags can significantly impact application performance.
-
Challenge: Managing replication lag to minimize data inconsistencies and ensure efficient operation.
Link: https://newsletter.systemdesigncodex.com/p/problems-caused-by-db-replication
-
Vanishing Updates
- Scenario: User updates data on the leader node, but a subsequent read request to a lagging replica shows outdated data.
- Problem: User experiences frustration as their updates appear to vanish.
- Solution: Implement read-after-write consistency. Methods include:
- Reading user-modified data from the leader.
- Tracking recent writes with timestamps.
- Monitoring and limiting queries on lagging replicas.
-
Going Backward in Time
- Issue: User sees an update (e.g., a new comment) and then it disappears upon refreshing, due to a lagging replica.
- User Experience: Confusion and inconsistency.
- Solution: Ensure Monotonic Reads.
- Users always read from the same replica.
- Use hashing based on User ID for replica selection.
-
Violation of Causality
- Problem: In sharded databases, replication lag causes sequence disorder in communication (e.g., a reply appears before the original message).
- Result: Appears as if cause and effect are reversed.
- Solution: Provide consistent prefix reads.
- Ensures writes are read in the order they were made.
Link: https://newsletter.systemdesigncodex.com/p/how-request-coalescing-works
Concept: Request Coalescing is a technique for optimizing database queries by reducing redundant requests for the same data.
Application: Successfully used by Discord to manage trillions of messages efficiently.
Functionality:
- Setup: Involves intermediary data services between the API layer and the database.
- Process:
- When the first request is made, a worker task is initiated in the data service.
- Subsequent requests for the same data subscribe to this existing task.
- The worker task queries the database once and returns the result to all subscribers simultaneously.
Differences from Caching:
- Request Initiation: In request coalescing, only the first request triggers a database query. Subsequent ones wait for its result. In caching, all requests would hit the cache.
- Use with Caching: Request coalescing can complement caching by reducing the number of hits to the cache.
Internal Working (Based on Discord's Implementation):
- Each worker task maintains a local state with requests and a list of requesters.
- Responses are propagated to all waiting requesters upon arrival.
Applicability:
- Request Coalescing is particularly useful for systems with high concurrency and redundant requests.
- The necessity of this technique depends on the scale and specific challenges of the system.
Link: https://newsletter.systemdesign.one/p/how-to-migrate-a-mysql-database
Context: Tumblr's MySQL database, spanning 21 terabytes and 60+ billion rows across 200+ servers, necessitated a migration strategy that minimizes user impact.
Challenges:
- Maintaining high availability and scalability.
- Minimizing downtime and user impact during migration.
Strategies Used:
-
CQRS Pattern (Command and Query Responsibility Segregation):
- Separated read and write operations for the database.
- Ensured continuous read availability during migration.
-
Leader-Follower Replication:
- Leader in a remote data center handled read-write operations.
- Local data center had followers for handling read requests.
- Used persistent connections to reduce latency issues.
-
Database Proxy (ProxySQL):
- Positioned in the local data center.
- Maintained persistent connections to the remote leader.
- Enabled connection pooling, improving performance and reducing disconnections.
Migration Process:
- Preparation:
- Stored metadata of leaders, followers, and proxies in each data center.
- Migration Execution:
- Shifted the database leader from Data Center A to B.
- Automated tools redirected followers and proxies to the new leader.
- Outcome:
- Followers continued serving read requests.
- Write requests were briefly halted or buffered, resulting in minimal user impact.
Consideration for Further Improvement:
- Leader-Leader Replication: Could enhance write availability but poses a risk of data conflicts.
- Reason for Non-Use: Potential conflicts might be why Tumblr opted against this approach.
Link: https://newsletter.francofernando.com/p/durability
Core Objective: Durability, ensuring data is not lost despite failures like power outages, system crashes, or hardware issues.
- Durability Method: Data is written to nonvolatile storage (hard drive, SSD).
- Transaction Processing:
- Log Writing: Data first written to a log file before making actual data updates.
- Update Execution: After log entry, the database updates the actual data.
- Role of Log: Enables reprocessing of transactions to restore consistent state post-failure.
- Efficiency: Log writing is fast due to its append-only nature, minimizing seek time.
- Complexity: Higher due to the need for coordination across multiple servers.
- Two-Phase Commit Protocol:
- Coordinator Role: A designated server coordinates the commit process.
- Process:
- Coordinator sends commit instruction to all participant servers.
- Waits for acknowledgments from all participants.
- Finalizes the transaction with a commit or rollback based on responses.
Link: https://newsletter.francofernando.com/p/redis
Redis Overview:
- Redis stands for REmote DIctionary Server.
- It's an open-source, key-value database store.
- Functions as a data structure server, supporting various data structures like Strings, Lists, Sets, Hashes, Sorted Sets, and HyperLogLogs.
History:
- Created by Salvatore Sanfilippo in the late 2000s.
- Developed to address scaling issues with MySQL in real-time analytics.
- Gained popularity and wide adoption due to its efficiency and flexibility.
Operations and Data Types:
- Basic operations include
GET
andSET
. - Supports diverse data structures, each with specific use cases and operations.
Redis Architectures:
- Single Instance: Simplest form, running on the same or a separate server.
- Replicated Instances: Primary instance replicated across secondary instances for parallel read requests and backup.
- Sentinel: Manages high availability, monitoring, and failure handling.
- Cluster: Distributes data across multiple machines using sharding.
Data Persistency:
- Offers two methods:
- RDB (Redis Database Backup): Snapshot-based backups.
- AOF (Append Only File): Logs every change for more recent backups.
- Choice between RDB and AOF depends on the need for speed vs. data recency.
Single-thread Model:
- Utilizes a single-threaded model for operations, avoiding multi-threading overhead.
- Performance typically limited by memory and network, not CPU.
Use Cases:
- Database: As a primary key-value store.
- Cache: For storing frequent queries or caching API requests.
- Pub/Sub: For scalable and fast messaging systems.
Link: https://newsletter.francofernando.com/p/salt-and-pepper
- Method: Converts plain text passwords into a random string of characters.
- Process: User's password is hashed and compared with the stored hash during login.
- Common Algorithms: MD5, SHA family. However, these are vulnerable to rainbow table attacks.
- Purpose: Enhances hashing by defending against pre-computation attacks like rainbow tables.
- Implementation:
- Generate a unique salt for each password.
- Combine salt with the password and hash the result.
- Store the salt in plain text and the hashed password in the database.
- Validation Process:
- Retrieve the salt from the database.
- Combine entered password with salt and hash.
- Compare with stored hash for validation.
- Uniqueness: Ensures each stored hash is unique, even for identical passwords.
- Function: Adds an extra layer of security to salting.
- Mechanism:
- Add a pepper value to the password before hashing.
- The pepper is not stored in the database.
- Login Process:
- Attempt combinations of password and pepper until a match is found.
- Benefit: Significantly increases the effort required for brute force attacks.
Key Takeaways:
- Combining Techniques: Using both salting and peppering provides robust protection.
- Importance of Uniqueness: Unique salts and peppers make each hash distinct.
- Updating Practices: Continuously update and improve password storage methods to counteract new hacking techniques.
Link: https://newsletter.systemdesigncodex.com/p/from-monolithic-to-microservices
- Concept: Incorporates modular design within a monolithic architecture.
- Characteristics:
- Loosely-coupled modules.
- Well-defined boundaries.
- Explicit dependencies.
- Structure: Application divided into independent modules.
- Deployment: Still maintains single application deployment.
- Advantages:
- Streamlines development and maintenance.
- Offers microservices-like benefits without associated complexities.
- Design Shift: From horizontal layers to vertical slices of business functionality.
- Benefits:
- Scoped changes to specific business areas.
- Easier feature addition and modification.
- Microservices Potential: Vertical modules can gradually evolve into independent microservices.
- Learning Opportunity: Provides insights into domain and functional splits.
- Balance: No inherent superiority of microservices over monoliths or vice versa.
- Evolutionary Approach: Adapt the architecture to evolving application needs.
- Pragmatism: Choose the architecture that best suits the project's requirements and context.
Link: https://newsletter.systemdesigncodex.com/p/the-secret-trick-to-high-availability
-
Active-Active High Availability:
- Implementation: Distribute traffic across instances in multiple Availability Zones (AZs).
- Example: If two instances are needed, create three (50% over-provisioning).
- Benefit: Maintains full capacity even if an entire AZ fails.
-
Active-Passive High Availability:
- Use Case: For stateful services like databases.
- Setup: Primary instance in one AZ and a standby in another.
- Function: Standby becomes primary if the original primary AZ goes down.
- Criticism: Viewed as resource wasteful due to over-provisioning.
- Justification:
- Essential for mission-critical applications where downtime is unacceptable.
- Used by major cloud services like AWS (EC2, S3, RDS) to prevent outages.
- Outages as a Norm: Disruptions are inevitable; planning for them is crucial.
- Risk Management: Over-provisioning is a strategic choice to mitigate downtime risks.
- Context-Dependent: The level of static stability required varies based on the system's criticality.
Static stability, while resource-intensive, is a fundamental approach for ensuring continuous operation in high-stake environments where reliability and uptime are non-negotiable.
Link: https://newsletter.systemdesigncodex.com/p/4-types-of-nosql-databases
- Examples: MongoDB, Couchbase, RavenDB.
- Data Storage: In the form of JSON, BSON, or XML documents.
- Advantages: Align closely with domain-level data objects in applications.
- Use Case: Ideal for projects requiring a structure close to application data.
- Examples: Redis, etcd, DynamoDB.
- Structure: Data stored as key-value pairs.
- Simplicity: Resembles a two-column table (key and value).
- Use Cases: Caching, shopping carts, user profiles.
- Examples: Apache Cassandra, Apache HBase.
- Storage Method: Data stored in columns rather than rows.
- Advantages: Efficient for analytics and aggregations on specific columns.
- Considerations: Not strongly consistent; write operations can be complex.
- Examples: Neo4j, Amazon Neptune.
- Concept: Focuses on relationships between data elements (nodes and links).
- Strengths: Eliminates the need for multiple table joins as in SQL databases.
- Use Cases: Knowledge graphs, social networks, map-like applications.
- Document DBs: Versatile, suitable for most applications traditionally using SQL.
- Key-Value Stores: For applications requiring fast read/write access to data items.
- Column-Oriented: Analytics and operations on large datasets.
- Graph Databases: Applications where relationships are central to the data model.