Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log structured merge trees #2

Open
3 of 19 tasks
at15 opened this issue Jan 20, 2017 · 4 comments
Open
3 of 19 tasks

Log structured merge trees #2

at15 opened this issue Jan 20, 2017 · 4 comments
Assignees

Comments

@at15
Copy link
Member

at15 commented Jan 20, 2017

Originally from https://github.com/at15/papers-i-read/issues/21

I think I will first write notes it gitbook and then summarize them with better format into the tex

DB

Extra

@at15 at15 self-assigned this Jan 20, 2017
@at15
Copy link
Member Author

at15 commented Jan 20, 2017

Blog

  • memtable + ss index + sstable
    • are keys sorted in memtable
    • the memtable is a hashtable or other? like I can use two array, one for key, one for value, it's just need to loop to find the array index of a given key
    • does sstable store key when using ss index
    • how is collapse achieved

leveldb code

Take away

  • sstable indexes (key + offset) are loaded into memory
  • write goes to memtable
  • memtable get flushed to disk
  • sstable are merged (collapsed together?)

@at15
Copy link
Member Author

at15 commented Jan 21, 2017

Quora

Take away

  • how to manage efficiently merging only sub-portions of the key-space
    • LevelDB/RocksDB tackles it by liberally relying on a b-tree based intermediate layer, the filesystem.
    • The LSM in Cassandra, HBase, and Hypertable are very close to the LevelDB "filesystem layered" approach

C_0, C_1 (this is not the case for sstable guys I guess)

  • C_0 is AVL tree
  • C_1 is B tree

About merging process

Once written, older generations are never modified. You do row modifications by rewriting the rows/records into a newer generation where it is found first.

The nice thing is that the old data is still being used by the system while this mergy happens "in the background". During the merge process, rows/records modified by newer generations or marked deleted are removed by simply not writing them. Once the merge is complete, this combined generation replaces the two it merged.

@at15
Copy link
Member Author

at15 commented Jan 22, 2017

Original paper

The most important part in the original paper should be about rolling merge, and especially how it handles recovery, but as I assume, its main focus is on index

  • TODO: is the original paper focus on index but later application use this method for storing actual data

@at15
Copy link
Member Author

at15 commented Jan 28, 2017

Some questions asked by other students

  • what happens when I delete something in the lower level, if you delete something that is in memory, you can add a tombstone for it, but if it is in disk, do you load it to buffer and add a tombstone for it, or you read from multi levels and apply the filter when merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant