Skip to content
This repository has been archived by the owner on Feb 27, 2023. It is now read-only.

[Exploration] Using Celestia's SMT for Pocket Network V1? #74

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

Olshansk
Copy link

@musalbas This PR isn't ready for review (yet) but I was hoping to use it as a starting point for a conversation.

After reading through cosmos/cosmos-sdk#7100 as well as many of the downstream threads, it seems that using a variation of Libra's JMT is a good alternative to IAVL. I summarized a few of it's highlights here.

We're hoping to use a go native implementation and I very much want to avoid building out own from scratch. I came by this library and would prefer to help build on top of it together. It's easy to use, but I'm still trying to make my way through the code and understand the inner workings.

Specifically, my main questions at the moment are:

  1. The JMT paper describes a general purpose Addressable Radix Merkle Tree with a variable r, but it seems this library only support binary trees.
    1.1. Is that correct?
    1.2 Is there a plan to extend it?

  2. Is this library being used anywhere?
    2.1. I'm primarily asking to in relation to it's performance and security "in the wild".
    2.2. Would you recommend an alternative library given that it's been a couple of years?

  3. I personally found it hard to understand what "side nodes" and "path nodes" represent, so I tried to build a little visualizer. After running go test -v ./... -run TestVisualizationHelper, I created the output below, but also used a small python notebook to visualize the tree as well.
    3.1 The textual output says that the side nodes of C are SIDE NODES: [ D, J, A, ZERO, ZERO, NODE, NODE ], but the tree visualization shows it at a lower height. I believe there's simply a gap in my understanding here so the main question I have is what "side nodes" are really meant to represent?

Happy to jump on a call if you think it'll be more productive than a text back & forth!

output

Leaf: F (3 side nodes, 4 path nodes)
	Largest common prefix: 2
	Common Bits : 1 0
	SIDE NODES: [ H, NODE, NODE ]
	PATH NODES: [ F, NODE, NODE, NODE ]
Leaf: G (2 side nodes, 3 path nodes)
	Largest common prefix: 1
	Common Bits : 0
	SIDE NODES: [ NODE, NODE ]
	PATH NODES: [ G, NODE, NODE ]
Leaf: J (6 side nodes, 7 path nodes)
	Largest common prefix: 5
	Common Bits : 1 1 0 1 0
	SIDE NODES: [ NODE, A, ZERO, ZERO, NODE, NODE ]
	PATH NODES: [ J, NODE, NODE, NODE, NODE, NODE, NODE ]
Leaf: A (5 side nodes, 6 path nodes)
	Largest common prefix: 4
	Common Bits : 1 1 0 1
	SIDE NODES: [ NODE, ZERO, ZERO, NODE, NODE ]
	PATH NODES: [ A, NODE, NODE, NODE, NODE, NODE ]
Leaf: B (5 side nodes, 6 path nodes)
	Largest common prefix: 4
	Common Bits : 0 1 0 0
	SIDE NODES: [ E, ZERO, I, G, NODE ]
	PATH NODES: [ B, NODE, NODE, NODE, NODE, NODE ]
Leaf: E (5 side nodes, 6 path nodes)
	Largest common prefix: 4
	Common Bits : 0 1 0 0
	SIDE NODES: [ B, ZERO, I, G, NODE ]
	PATH NODES: [ E, NODE, NODE, NODE, NODE, NODE ]
Leaf: H (3 side nodes, 4 path nodes)
	Largest common prefix: 2
	Common Bits : 1 0
	SIDE NODES: [ F, NODE, NODE ]
	PATH NODES: [ H, NODE, NODE, NODE ]
Leaf: I (3 side nodes, 4 path nodes)
	Largest common prefix: 2
	Common Bits : 0 1
	SIDE NODES: [ NODE, G, NODE ]
	PATH NODES: [ I, NODE, NODE, NODE ]
Leaf: C (7 side nodes, 8 path nodes)
	Largest common prefix: 6
	Common Bits : 1 1 0 1 0 0
	SIDE NODES: [ D, J, A, ZERO, ZERO, NODE, NODE ]
	PATH NODES: [ C, NODE, NODE, NODE, NODE, NODE, NODE, NODE ]
Leaf: D (7 side nodes, 8 path nodes)
	Largest common prefix: 6
	Common Bits : 1 1 0 1 0 0
	SIDE NODES: [ C, J, A, ZERO, ZERO, NODE, NODE ]
	PATH NODES: [ D, NODE, NODE, NODE, NODE, NODE, NODE, NODE ]

@adlerjohn
Copy link
Member

adlerjohn commented Oct 31, 2022

1.1. Is that correct?

Yes.

1.2 Is there a plan to extend it?

Yes and no. No: if the canonical commitment involves hashing >2 children at once, then proofs will be larger. The optimal proof sizes are with exactly 2 leaves. Yes: you can implement larger number of children for hypothetically better performance (this really needs to be tested empirically instead of being believed blindly) at the implementation level. That's an optimization that we want to make, modulo manpower and time.

  1. Is this library being used anywhere?

Yes, it's used by Trustix for example: nix-community/trustix@da5ca07 (tweet). You can see dependents here and here.

2.2. Would you recommend an alternative library given that it's been a couple of years?

I'm honestly not aware of any other compact SMT library in Go. If you want a Rust implementation of the same specification, see https://github.com/FuelLabs/fuel-merkle.

@adlerjohn
Copy link
Member

The textual output says that the side nodes of C are SIDE NODES: [ D, J, A, ZERO, ZERO, NODE, NODE ], but the tree visualization shows it at a lower height.

Side nodes mean the neighboring nodes for a proof. Your text output is correct: you can see the neighboring nodes as you go up are indeed D, J, A, followed by two no-neighbors (ZERO) followed by one internal node (which I guess your script doesn't label, but it's 0). The root doesn't need a neighbor, so your script runs one too many times.

@musalbas
Copy link
Member

musalbas commented Oct 31, 2022

No: if the canonical commitment involves hashing >2 children at once, then proofs will be larger.

I think this is not true. To my knowledge, JMT implements a 16-ary tree only for physical storage, but the underlying logical tree is still 2-ary, so proofs are still the same as a normal SMT. In theory, a similar optimization should be possible for this library.

Would you recommend an alternative library given that it's been a couple of years?

There is a pending PR to this library (#73) that makes the performance much better by @roysc, which further changes should probably be re-based on.

By the way, we're not using this library anymore internally at Celestia, because replacing IAVL in the Cosmos store is not trivial. However, there's nothing wrong with using this library independently for other use cases, and if you want to help maintain and improve it, be our guest.


A Go library that implements a Sparse Merkle tree for a key-value map. The tree implements the same optimisations specified in the [Libra whitepaper][libra whitepaper], to reduce the number of hash operations required per tree operation to O(k) where k is the number of non-empty elements in the tree.
A Go library that implements a Sparse Merkle Tree for a key-value map. The tree implements the same optimisations specified in the [Jellyfish Merkle Tree whitepaper][jmt whitepaper] originally designed for the [Libra blockchain][libra whitepaper]. It reduces the number of hash operations required per tree operation to `O(k)` where `k` is the number of non-empty elements in the tree.
Copy link
Member

@musalbas musalbas Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This library is not based on JMT, just an optimized SMT. The JMT was introduced after the original Libra whitepaper, which is basically an optimized SMT but with further optimizations on the storage layer (storing it as a 16-ary tree, etc), which we don't make here yet. The original Libra whitepaper mentioned an optimized SMT, but did not introduce the JMT (with extra storage optimizations).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a JMT is just an SMT with r=16 + RocksDB + versioned keys.

Screen Shot 2022-10-31 at 6 39 04 PM

@Olshansk
Copy link
Author

Olshansk commented Nov 1, 2022

Appreciate the responses everyone!

By the way, we're not using this library anymore internally at Celestia, because replacing IAVL in the Cosmos store is not trivial. However, there's nothing wrong with using this library independently for other use cases, and if you want to help maintain and improve it, be our guest.

Got it. At the very least I'll update the visualization scripts after the rebase since I think it's valuable.

I'm more concerned about being the sole user/maintainer of the library long-term so as to benefit and contribute to security & performance optimizations others are making as well. Do you have a rec as to whether we should focus on celestiaorg/iavl or this library?


@adlerjohn

Yes: you can implement larger number of children for hypothetically (this really needs to be tested empirically instead of being believed blindly) at the implementation level. That's an optimization that we want to make, modulo manpower and time.

Understood and makes sense.

If you want a Rust implementation of the same specification, see FuelLabs/fuel-merkle.

We're building in Go, so it'd be nice to have everything be native (if possible). When the time comes, if it's necessary, we might build a wrapper similar to what harmony did with their VRF bounty (https://gitcoin.co/issue/25313). Premature optimization for the time being though.

Side nodes mean the neighboring nodes for a proof

Got it. It's basically the "classic" merkle branches we need for the proof.

The root doesn't need a neighbor, so your script runs one too many times.

Will update so it's easier to visualize the ZEROs and fix the big.

I think this is not true. To my knowledge, JMT implements a 16-ary tree only for physical storage, but the underlying logical tree is still 2-ary, so proofs are still the same as a normal SMT. In theory, a similar optimization should be possible for this library.

Though the number 16 was "arbitrarily" (probably some benchmarking) to account for the tradeoff of read & write volume for RocksDB compaction, the following diagram from the OG paper seems to imply it is non-binary at a logic level as well:

Screen Shot 2022-10-31 at 6 33 23 PM


@musalbas

There is a pending PR to this library (#73) that makes the performance much better by @roysc, which further changes should probably be re-based on.

That's great. Will def rebase on top of that once it's in!

@musalbas
Copy link
Member

musalbas commented Nov 4, 2022

I'm more concerned about being the sole user/maintainer of the library long-term so as to benefit and contribute to security & performance optimizations others are making as well. Do you have a rec as to whether we should focus on rollkit/iavl#1 or this library?

I wouldn't necessarily recommend iavl as a standalone state tree for use cases outside of the Cosmos SDK, the codebase is quite messy, and I believe the Cosmos SDK team wants to refactor or rewrite it. The main reason we're using it is for compatibility with the Cosmos SDK.

Since we're not currently using this SMT library though, Celestia Labs isn't actively adding new features etc, but we're happy to merge PRs / accept outside contributions, especially if people want to help maintain it.

@Olshansk
Copy link
Author

Olshansk commented Nov 7, 2022

I wouldn't necessarily recommend iavl as a standalone state tree for use cases outside of the Cosmos SDK, the codebase is quite messy, and I believe the Cosmos SDK team wants to refactor or rewrite it. The main reason we're using it is for compatibility with the Cosmos SDK.

Since we're not currently using this SMT library though, Celestia Labs isn't actively adding new features etc, but we're happy to merge PRs / accept outside contributions, especially if people want to help maintain it.

Makes sense 👌

We'd be really interested in helping maintain and contribute to it as it seems to fit our use case. In terms of next steps, I'll take a look at #73 and rebase this on top of it once it's merged in.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants