Redesign the IAVL tree with pruning in mind #144

zmanian · 2019-06-15T17:22:03Z

When we original designed the IAVL tree, the application state did not prune old state and all versions were kept in memory.

The current default behavior is that we prune we keep the last 100 versions and prune the 101st version. If the application has a the behavior of having "hot" keys that updated on every block, this effective doubles the disk i/o.

At the moment, I think the biggest improvement in IAVL i/o performance is that we introduce awareness of the pruning strategy into the IAVL tree so that generally we don't write values to the database except at snapshot heights.

If pruning is disabled, all values are persisted to the database.

yutianwu · 2019-07-03T07:47:46Z

If we do not write values to database every block, that will improve throughput dramatically. But we need to replay all blocks from latest saved state when we restart node.

do we have a plan to do this

zmanian · 2019-07-03T16:10:43Z

yeah this would be need.

IF there is a graceful shutdown, we just flush to disk before we shutdown but block replay would be great for recovery in a panic

zmanian · 2019-07-03T16:11:28Z

See #150

yutianwu · 2019-07-04T02:19:57Z

Actually, if we do save LastCommit when we do not save IAVL state, then blocks will be replayed automatically for the difference between state height and block height. A graceful shutdown will surely help to save the replay work.

So besides the IAVL change you mentioned, we also need to do some changes on cosmos Commit stage.

AdityaSripal · 2019-07-09T19:24:38Z

Currently working on this by building on top of loom PR (#150).

Current design

MutableTree has extra fields:

// Pruning fields
keepEvery  int64n // Saves version to disk periodically
keepRecent int64  // Saves recent versions in memory

NodeDB has extra fields:

memDb    dbm.DB     // Memory node storage.
memBatch dbm.Batch  // Batched writing buffer for memDB.

On SaveRoot, the IAVL checks if the version should be persisted to disk (version % keepEvery == 0)

If version is not going to be persisted to disk, the version is simply saved in memDB
If version is persisted to disk, the version is written to memDB and levelDB

When version n is saved, version n - keepRecent is deleted from memDB. Thus, memDB always contains keepRecent versions of the tree.

Orphans:

Save orphan to memDB under o|toVersion|fromVersion.

If there exists snapshot version snapVersion s.t. fromVersion < snapVersion < toVersion, save orphan to disk as well under o|snapVersion|fromVersion.
NOTE: in unlikely event, that two snapshot versions exist between fromVersion and toVersion, we use closest snapshot version that is less than toVersion

Can then simply use the old delete algorithm with some minor simplifications/optimizations

Open Questions:

Currently recently persisted versions exist both in memDB and levelDB. This is so that retreiving a recently persisted version is fast. However, it introduces minor duplication (not a problem in any sane pruning strategy).
Currently recent versions are saved to memDB. Is this better than simply storing in a map key => *Node like loom currently does? I'm not sure what the tradeoffs are.
Decision: Decided to go with using memDB since it already implements DB interface (iterating, etc). Also, could switch out memDB for something else later so long as it respects DB interface.
Now that memDB acts as a recent version "cache" for levelDB, need to specify the use (if any) for LRU cache. My thinking is that this will be used to cache old nodes (version < latest - keepRecent) that are frequently called by GetNode. But have to make sure that LRU cache's purpose is strictly defined (when does a node get added to cache?) and enforced.
Currently all traverse functions traverse over single levelDB. This will have to be refactored to allow traversing over levelDB, memDB, or both. Should replace all current traversal calls with the appropriate new traversal function.
Currently implementing:
We can flush any versions in memDB to disk in event of graceful shutdown, how do we restart node correctly?
Current thinking: Regardless of whether there is a graceful shutdown or not, on recovery, we reverse-iterate for the latestVersion stored on disk (and refill memDB if necessary).

from conversation with @jackzampolin

zmanian · 2019-07-09T19:30:37Z

For recovery, Tendermint store metadata on the LastCommit and then replays all past block. Tendermint will need to know how to back up to last save commit and then replay.

Current thinking is that isn't necessarily hard.

jackzampolin · 2019-07-12T15:25:41Z

Sounds like you should write up an approach for points 3 and 4 above and we can get some feedback on those.

tac0turtle · 2020-01-16T11:42:51Z

This is closed with PR correct @AdityaSripal

tac0turtle · 2022-09-24T18:59:44Z

Reopening this as a potential work scope for future iavl work.

tac0turtle · 2023-05-17T07:14:44Z

closing this as the goal is referenced in #140

AdityaSripal mentioned this issue Jul 10, 2019

WIP: Introduce Pruning to IAVL #151

Closed

tac0turtle added this to the 0.13.0 milestone Jul 15, 2019

AdityaSripal mentioned this issue Jul 23, 2019

Introduce Pruning to IAVL #158

Merged

blackpainter mentioned this issue Sep 10, 2019

给出存储消耗过大的解决方案 QOSGroup/qos#269

Open

tac0turtle closed this as completed Jan 16, 2020

Lbird mentioned this issue Jun 4, 2020

Poor SaveVersion() performance when using PruningOptions #256

Closed

tac0turtle reopened this Sep 24, 2022

tac0turtle closed this as completed May 17, 2023

tac0turtle mentioned this issue May 17, 2023

An optimal backend for the IAVL #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign the IAVL tree with pruning in mind #144

Redesign the IAVL tree with pruning in mind #144

zmanian commented Jun 15, 2019

yutianwu commented Jul 3, 2019

zmanian commented Jul 3, 2019

zmanian commented Jul 3, 2019

yutianwu commented Jul 4, 2019

AdityaSripal commented Jul 9, 2019 •

edited

Loading

zmanian commented Jul 9, 2019

jackzampolin commented Jul 12, 2019

tac0turtle commented Jan 16, 2020

tac0turtle commented Sep 24, 2022

tac0turtle commented May 17, 2023

Redesign the IAVL tree with pruning in mind #144

Redesign the IAVL tree with pruning in mind #144

Comments

zmanian commented Jun 15, 2019

yutianwu commented Jul 3, 2019

zmanian commented Jul 3, 2019

zmanian commented Jul 3, 2019

yutianwu commented Jul 4, 2019

AdityaSripal commented Jul 9, 2019 • edited Loading

Current design

Orphans:

Open Questions:

zmanian commented Jul 9, 2019

jackzampolin commented Jul 12, 2019

tac0turtle commented Jan 16, 2020

tac0turtle commented Sep 24, 2022

tac0turtle commented May 17, 2023

AdityaSripal commented Jul 9, 2019 •

edited

Loading