celestiaorg · rootulp · Jun 21, 2023 · Jun 18, 2023 · Jun 18, 2023 · Jun 20, 2023
@@ -2,57 +2,77 @@
 
 ## Status
 
-Implemented in <https://github.com/celestiaorg/celestia-app/pull/1604>
+Implemented in <https://github.com/celestiaorg/celestia-app/pull/1604>. `SubtreeRootThreshold` decreased from 128 to 64 in <https://github.com/celestiaorg/celestia-app/pull/1766>.
 
 ## Changelog
 
 - 2023/03/01: Initial draft
 - 2023/05/30: Update status
+- 2023/06/17: Add a section for "Picking a `SubtreeRootThreshold`". Rephrased some sections.
 
 ## Context
 
-When laying out blobs in the square we create padding to follow the non-interactive default rules. With ADR 009 we decreased padding by aligning the blob in the square to the index of the `MinSquareSize` of the given blob. This is a good improvement but we can do better.
+When laying out blobs in the square, the block proposer adds namespace padding shares between blobs to conform to the non-interactive default rules. [ADR-009](./adr-009-non-interactive-default-rules-for-reduced-padding.md) reduced padding by placing blobs at an index that is a multiple of the `MinSquareSize(blob)`[^1]. This is a good improvement but we can reduce padding further.
 
-Looking at different ranges of blob sizes we can see that the ratio of blob size to padding is not constant. Insight:  **The ratio of blob size to padding is smaller for smaller blobs and larger for larger blobs.** The bigger the ratio the better.
+If we analyze the worst case padding for different blob sizes, we can see that the ratio of blob size to padding is not constant. Insight: **the ratio of blob size to padding is small for small blobs and large for large blobs.**
 
 ![Worst Case Padding In Blob Size Range](./assets/adr013/worst-case-padding-in-blob-size-range.png)
 
-This means small blobs generate more possible padding in comparison to the data they provide. This is not ideal as we want to have as little padding as possible. As padding is not being paid for there is no incentive to use larger blobs.
+This means small blobs are inefficient because they generate more potential padding for the data they provide. This is not ideal as we want to minimize padding. Since users do not pay for namespace padding shares, they may not be sufficiently incentivized to submit large blobs.
 
-In the naive approach if you align the blob at an index of one you get zero padding as each blob can follow the next. This would make the hash of each share a subtree root. In a square with N shares, you would always get N subtree roots. But having a blob inclusion proof of size N for large blobs is too much and unfeasible.
+In the naive approach if the block proposer aligned blobs one after another then there would be zero padding between blobs. Although this approach minimizes padding, it comes at the cost of large blob inclusion proofs because the hash of each share would be a subtree root of the proof. Put another way, the blob inclusion proof for a blob of size N shares would include N subtree roots. Large blob inclusion proofs may be difficult to download for resource constrained light nodes.
 
-Small blob sizes have the lowest ratio but also small inclusion proofs. Therefore increasing the proof size is not a problem until some threshold. It would increase the ratio of blob size to padding for small blobs which have the worst ratio.
+Small blob sizes have the lowest ratio of data to padding but also have small blob inclusion proofs. Since the size of blob inclusion proofs is an important constraint, we can establish a threshold to bound the number of subtree roots in a blob inclusion proof. From now on this threshold is called the `SubtreeRootThreshold` and it sets an upper bound for the number of subtree roots in a blob inclusion proof.
 
-Assuming there is a threshold, **the number of subtree roots in a proof**, where the proof size of a blob is acceptable, we can use this threshold to determine the index of the blob in the square. This would give us zero padding for blobs that are smaller than the threshold and non-zero padding for blobs that are larger than the threshold but still smaller than before.
+We can increase the `SubtreeRootThreshold` (and correspondingly the blob inclusion proof size) as long we are confident light nodes can process such a proof size. By increasing the subtree root threshold, we can place blobs closer together and therefore decrease the padding in the square. This is especially useful for small blobs since they have the lowest ratio of data to padding.
 
-Let's assume a good threshold assumption is that the number of subtree roots in a blob inclusion proof is acceptable if it is smaller than the `MaxSquareSize`. This would mean that the blobs smaller than the square size can use the index of one to get zero padding. Blobs that are larger than `MaxSquareSize` but smaller than `MaxSquareSize * 2` can use the index of two to get a maximum of 1 padding square. Blobs that are larger than `MaxSquareSize * 2` but smaller than `MaxSquareSize * 4` can use the index of 4 to get a maximum of 3 padding shares and so on.
+## Proposal
 
-The new non-interactive default rules would be:
+The proposed non-interactive default rules: A blob must start at an index that is a multiple of the subtree root width for the given blob. The subtree root width for a blob is the minimum of:
 
-Blobs start at an index that is equal to a multiple of the blob length divided by `MaxSquareSize` rounded up.
+- `math.Ceil(blob / SubtreeRootThreshold)` rounded up to the next power of two
+- `MinSquareSize(blob)`[^1]
 
-If the blob length is smaller than `MaxSquareSize` then the blob starts at index 1.
-`MaxSquareSize` can be changed to another threshold. The smaller the threshold the more padding we will have.
+where `blob` is the length of the blob in shares and `SubtreeRootThreshold` is some constant.
 
-The picture below shows the difference between the old and new non-interactive default rules in a square of size 8 and a threshold of 8.
+Note, `MinSquareSize(blob)` is retained in this iteration of the non-interactive default rules to prevent some blobs from having more padding with this proposal than they had with the old non-interactive default rules.
+
+## Visualization
+
+The diagram below shows the difference between the old and new non-interactive default rules in a square of size 8 with `SubtreeRootThreshold` of 8.
 
 ![Blob Alignment Comparison](./assets/adr013/blob-alignment-comparison.png)
 
+## Picking a `SubtreeRootThreshold`
+
+To recap, the `SubtreeRootThreshold` determines the index of where a blob must start in the square. A low `SubtreeRootThreshold` results in small blob inclusion proofs at the cost of more padding in the square.
+
+For example, assume `SubtreeRootThreshold = 64`. This would mean that the blobs smaller than the `64` can start at an index that is a multiple of one and therefore introduce zero padding. Blobs that are larger than `64` but smaller than `64 * 2 = 128` can use an index that is a multiple of 2 to get a maximum of 1 padding share. Blobs that are larger than `64 * 2 = 128` but smaller than `64 * 4 = 256` can use an index that is a multiple of 4 to get a maximum of 3 padding shares and so on.
+
+Blob size (in number of shares) | Subtree root width[^2] | Index in square | Worst case padding
+--------------------------------|------------------------|-----------------|-------------------
+blob <= 64                      | 1                      | multiple of 1   | 0
+64 < blob <= 128                | 2                      | multiple of 2   | 1
+128 < blob <= 256               | 4                      | multiple of 4   | 3
+256 < blob <= 512               | 8                      | multiple of 8   | 7
+
+Note: the threshold doesn't need to be `64` and the implementation versions this constant so that it is possible to modify over time.
+
 ## Analysis
 
-### Light-Nodes
+### Light nodes
 
-The Proof size is bounded by the number of subtree roots in the blob inclusion proof. If the new bound is the `MaxSquareSize` then the worst case for the number of subtree roots in a blob inclusion proof will be `MaxSquareSize`.
+The proof size is determined by the number of subtree roots in the blob inclusion proof which is bounded by the threshold. If the new threshold is the `SubtreeRootThreshold` then the worst case for the number of subtree roots in a blob inclusion proof will be bounded by `SubtreeRootThreshold`.
 
-If Light-Nodes can process this proof size without a problem then we can use this bound. If not, we can use a smaller bound. The smaller the bound the more padding we will have.
+If light nodes can process this proof size without a problem then we can use this bound. If not, we can use a smaller bound. The smaller the bound the more padding we will have.
 
-In addition, we could use PFB inclusion proofs (ADR 11) to reduce the proof size of the blob inclusion proof for Light-Nodes. This would make this change not noticeable to them as they are blob size independent until we need a fraud-proof for a malicious PFB inclusion.
+In addition, we could use PFB inclusion proofs ([ADR 11](./adr-011-optimistic-blob-size-independent-inclusion-proofs-and-pfb-fraud-proofs.md)) to reduce the proof size of the blob inclusion proof for light nodes. This would make this change not noticeable to them as they are blob size independent until we need a fraud-proof for a malicious PFB inclusion.
 
-This fraud-proof would still be magnitudes smaller than a bad encoding fraud-proof. Both cases require 2/3 of the Celestia validators to be malicious. In both cases, the chain would halt and fall back to social consensus. If a Light-Node can process the bad encoding fraud-proof then it can also process the PFB fraud-proof easily.
+This fraud-proof would still be magnitudes smaller than a bad encoding fraud-proof. Both cases require 2/3 of the Celestia validators to be malicious. In both cases, the chain would halt and fall back to social consensus. If a light node can process the bad encoding fraud-proof then it can also process the PFB fraud-proof easily.
 
-### Partial-Nodes
+### Partial nodes
 
-Partial nodes in this context are Celestia-node light nodes that may download all of the data in the reserved namespace. They check that the data behind the PFB was included in the `DataRoot`, via blob inclusion proofs.
+In this context, partial nodes are celestia-node light nodes that may download all of the data in the reserved namespaces. They check that the data behind the PFB was included in the `DataRoot`, via blob inclusion proofs.
 
 The sum of the size of all blob inclusion proofs will be larger than the sum with the previous non-interactive default rules.
 
@@ -66,12 +86,6 @@ Here is a diagram of the worst-case padding for a threshold of 16 for the square
 
 ![Worst Case Padding Comparison](./assets/adr013/worst-case-padding-comparison.png)
 
-### Additional Remarks
-
-If the threshold is bigger than `MinSquareSize` for a particular blob then the blob will be aligned to the index of the `MinSquareSize` of the blob. This would prevent some blob size ranges to have higher padding than they had before this change. So the real new non-interactive default rules would be:
-
-Blobs start at an index that is equal to a multiple of the blob length divided by `MaxSquareSize` rounded up. If this index is larger than the `MinSquareSize` of the blob then the blob starts at the index of the `MinSquareSize`.
-
 ## Consequences
 
 ### Positive
@@ -80,8 +94,12 @@ Most blocks will have close to zero padding.
 
 ### Negative
 
-The number of subtree roots to download for Partial-Nodes will increase in the average case.
+The number of subtree roots to download for partial nodes will increase in the average case.
 
 ### Neutral
 
-The number of subtree roots to download for Light-Nodes will increase in the average case, but it is still small enough as the threshold will be chosen wisely. Furthermore, this effect can be mitigated by using PFB inclusion proofs.
+The number of subtree roots to download for light nodes will increase in the average case, but it is still small enough as the threshold will be chosen wisely. Furthermore, this effect can be mitigated by using PFB inclusion proofs.
+
+[^1]: `MinSquareSize(blob)` is a function that returns the minimum square size that can contain a blob of size `blob` shares. For example, a blob that spans 5 shares can be contained in a square of size 4 x 4 but it cannot be contained in a square of size 2 x 2. Note that square sizes must be powers of two. As a result `MinSquareSize(5) = 4`.
+
+[^2]: Subtree root width is the maximum number of leaves per subtree root.
@@ -8,12 +8,14 @@ const (
 	LatestVersion = v1.Version
 )
 
-// SubtreeRootThreshold works as a target value for the number of subtree roots in the
-// share commitment. If a blob contains more shares than this number, then the height
-// of the subtree roots will gradually increase so that the amount remains within that limit.
-// The rationale for this value is described in more detail in ADR013
-// (./docs/architecture/adr-013).
-// ADR013 https://github.com/celestiaorg/celestia-app/blob/e905143e8fe138ce6085ae9a5c1af950a2d87638/docs/architecture/adr-013-non-interactive-default-rules-for-zero-padding.md //nolint: lll
+// SubtreeRootThreshold works as a target upper bound for the number of subtree
+// roots in the share commitment. If a blob contains more shares than this
+// number, then the height of the subtree roots will increase by one so that the
+// number of subtree roots in the share commitment decreases by a factor of two.
+// This step is repeated until the number of subtree roots is less than the
+// SubtreeRootThreshold.
+//
+// The rationale for this value is described in more detail in ADR-013.
 func SubtreeRootThreshold(_ uint64) int {
 	return v1.SubtreeRootThreshold
 }

@@ -6,12 +6,12 @@ import (
 	"golang.org/x/exp/constraints"
 )
 
-// FitsInSquare uses the non interactive default rules to see if blobs of
-// some lengths will fit in a square of squareSize starting at share index
-// cursor. Returns whether the blobs fit in the square and the number of
-// shares used by blobs. See blob share commitment rules
+// FitsInSquare uses the non interactive default rules to see if blobs of some
+// lengths will fit in a square of squareSize starting at share index cursor.
+// Returns whether the blobs fit in the square and the number of shares used by
+// blobs. See ADR-013 and the blob share commitment rules.
+//
 // ../../specs/src/specs/data_square_layout.md#blob-share-commitment-rules
-// ../../docs/architecture/adr-013-non-interactive-default-rules-for-reduced-padding.md
 func FitsInSquare(cursor, squareSize, subtreeRootThreshold int, blobShareLens ...int) (bool, int) {
 	if len(blobShareLens) == 0 {
 		if cursor <= squareSize*squareSize {
@@ -44,10 +44,10 @@ func BlobSharesUsedNonInteractiveDefaults(cursor, squareSize, subtreeRootThresho
 }
 
 // NextShareIndex determines the next index in a square that can be used. It
-// follows the blob share commitment rules defined in ADR013. Assumes
-// that all args are non negative, and that squareSize is a power of two.
+// follows the blob share commitment rules defined in ADR-013. Assumes that all
+// args are non negative, and that squareSize is a power of two.
+//
 // https://github.com/celestiaorg/celestia-specs/blob/master/src/rationale/message_block_layout.md#non-interactive-default-rules
-// https://github.com/celestiaorg/celestia-app/blob/0334749a9e9b989fa0a42b7f011f4a79af8f61aa/docs/architecture/adr-013-non-interactive-default-rules-for-zero-padding.md
 func NextShareIndex(cursor, blobShareLen, squareSize, subtreeRootThreshold int) int {
 	// if we're starting at the beginning of the row, then return as there are
 	// no cases where we don't start at 0.