Skip to content

JaimePolidura/SimpleDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimpleDB

SQL database built over a LSM storage engine. This project consist of three layers:

  • Storage engine
  • DB
  • Server

Features

  • Tables
  • SQL-Like queries
  • MVCC Transactions
  • Secondary indexes

Missing features

  • SSL/TLS encryption is not supported.
  • Triggers.
  • Multiple index types.
  • Serializable transaction support.
  • Joins and inner queries.

Storage engine (/storage)

The storage engine exposes an API which is used by the upper layer (DB).

  • LSM Based The engine is based on a Log-Structured Merge (LSM) tree, following the LSM in a week guide.
  • Simple API The engine exposes simple API operations like: get(key), set(key, value), delete(key), scan_all() and scan_from(key).
  • MVCC Transaction support The storage engine exposes an API to support transactions: start_transaction(), commit() and rollback().
  • Consistency and durability It has a transaction log and a memtable WAL to ensure durability and consistency during crashes.
  • Compaction The storage engine provides two compaction algorithms: SimpleLeveled and SizeTiered.
  • Structure An instance of a storage engine, consists of multiple keyspaces (like SQL tables, where keys can be written or read) and a transaction log.

DB (/db)

The DB layer uses the storage engine layer to create tables, rows and databases. And it exposes an API to run SQL-like queries.

  • Database and Tables (database.rs table.rs) For each database created, an instance of the storage engine is created. Each table in the database corresponds to a keyspace in the database's storage engine instance.
  • Table schema (table_descriptor.rs) As SQL-like databases, every table will have columns with its name and type. Every table will have a file, which stores their schema.
  • Row mapping (row.rs record.rs) The key used in the storage engine will be the primary key of the row, and the value used in the storage engine will represent the column values.
    • For example to insert a value in a table, the storage engine operation would be: set(key = Primary key, value = |Column ID #1 | Column Value #1 |Column ID #2 | Column Value #2 |...(binary))
  • Append only updates Update operations (like SQL updates, deletes or inserts) in the storage engine are append only. For example to update a row, the operation would be: set(key = Primary key of the row being updated, value = |Column ID being updated | New Column value |)
  • Read operations (table_iterator.rs) Scans are performed using iterators. Since row data may be scattered across SSTables and Memtables due to the append-only update mechanism, the iterators must reassemble the full row before returning it to the user.
  • Secondary indexes (secondary_index.rs). Secondary indexes map the indexed column value to a list of primary keys that contain that value. A separate storage engine keyspace will be created for each secondary index.
    • For example to update a secondary index value, the storage engine operation would be: set(key = Indexed value, value = |Primary key #1 | Primary key #2|...)
  • Queries (statement.rs) Once a table interface is established for updating values, inserting records, and reading rows, queries can be parsed and executed. Each query undergoes several steps:
    • Tokenization (tokenizer.rs). The query is transformed into a stream of tokens.
    • Parsing (statement.rs). Given the list of tokens is converted into an Abstract Syntax Tree (AST).
    • Validation (validator.rs). The AST is validated by checking the types, column names, and table names to ensure they are correct and consistent.
    • Scan type analysis (scan_type_analyzer.rs). For expressions that require scanning a table, an analysis is performed to determine the appropriate scan method: full scan, range scan, secondary index scan, or merge index scan.
    • Plan creation (planner.rs). Given a scan type and a statement, a plan is created. A plan is just a series of steps to execute a query.
    • Execution (executor.rs). Finally, the query is executed according to the generated plan.

Server (/server)

  • Exposes simple TCP server to execute client requests. The default port is 8888
  • Includes custom binary format.
  • Includes authentication passwords. The default password is 123456

Client (/client-cli)

  • Simple CLI client like mysql.

About

SQL database built over a custom LSM storage engine following https://skyzh.github.io/mini-lsm/00-preface.html guide

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages