adr: Data Availability Module and Components #414

Wondertan · 2021-06-14T12:36:11Z

WIP

liamsi · 2021-06-14T16:10:30Z

docs/lazy-adr/adr-005-da-modules-and-components.md

+`Put` in push schema takes all the shares of an extended square, encodes every share along with a `(col;row)` coordinate
+together with `DataHash` and publishes them to the `/celestia` topic.


I hope you don't mind the question: doesn't this introduce a DoS attack vector? because you can only validate if shares match a header on a row or column basis (if you have the DAHeader at hand, otherwise you need enough shares to reconstruct the whole block until you can be sure). @adlerjohn raised the concern that with presumably really large blocks this could cause problems.

doesn't this introduce a DoS attack vector?

@liamsi, Note that there is no difference between spamming shares and rows/cols, as you can generate dummy data and spam the network for both with any amount. The only way to mitigate that is to use peer scoring. I'll describe how we can use this soon.

Note that there is no difference between spamming shares and rows/cols, as you can generate dummy data and spam the network for both with any amount

Yeah, but it makes a difference how much data you must read before you can actually go and score a peer accordingly. In the case of rows you could after one garbage row, in the case of shares (and only having the data root) you'd need to see all shares first. But good point. And that scoring system looks interesting. In tendermint you can just mark peers as "bad" as far as I remember.

@liamsi, It is true in case we use DAHeader, but later on we want to have DataHash only, and in such a case you will get the same situation as with share granularity.

DAHeader is essentially an inner node, so the question is - do we need to broadcast inner nodes?

DAHeader is essentially an inner node, so the question is - do we need to broadcast inner nodes?

While that statement is true, I still see the DAHeader as a special case of inner nodes. We might want to broadcast the DAHeader but not all other inner nodes. I need to think about this.

Chipping in here to note requiring 1/4 of the EDS (i.e. an ODS worth of shares) to be downloaded without verification will be a DoS vector. Not a RAM resource exhaustion, but a network bandwidth exhaustion.

@Wondertan when you say "and publishes them," are shares published individually, or an entire block? From what you wrote in this comment thread, it looks like individual shares, for which Merkle proofs are downloaded? Not sure.

Chipping in here to note requiring 1/4 of the EDS (i.e. an ODS worth of shares) to be downloaded without verification will be a DoS vector. Not a RAM resource exhaustion, but a network bandwidth exhaustion.

@adlerjohn, It is, until you do verification, penalize spamming peer, stopping further communications with him, thus preventing DoS.

when you say "and publishes them," are shares published individually, or an entire block?

@adlerjohn, individually

If we can't verify the block until we fully received or recovered it, then there is no difference between sending it fully or by shares or by other pieces in terms of verification. However, sending shares in a batch could save more bandwidth, as we can append DataHash to batch instead of per share.

The batch size can be maxed out to max PubSub message size. The default is 1mb and I am not aware of any rationale behind that. Indeed, there should be the most optimal message size for our use case.

liamsi · 2021-06-14T21:12:40Z

docs/lazy-adr/adr-005-da-modules-and-components.md

+Currently, `DAHeader` and block's `Data` are defined in `ll-core/types`. However, with modularity in mind and plan for
+DA to be used outside of `ll-core`, e.g. in `optimint`, it now make sense to move them to new `da` package.


Reiterating what I've mentioned in our call, it's non-obvious the lower-level APIs defined here need to be modular (e.g. the interfaces below).
Optimint is a good point but the abstractions there will be much more general.

Moving the DA logic out of the types package into a separate one (maybe even into a separate repo later) is a good point though. The networking-related methods don't belong in the types package 👌🏼

Though moving out the DAHeader or the block's Data out of the types package will be inconsistent with the other core types but maybe there are good reasons for that.

+1 on moving out the DA logic. Also +1 on not moving out the Data especially, as it's purely a detail of LazyLedger. The DAHeader is a bit trickier, as it straddles both domains. Could the DAHeader not be present in both ll-core and da?

Could the DAHeader not be present in both ll-core and da

It can be removed entirely in general. But for now, I would just move it to DA lib and make ll-core to depend on it and use DAHeader defined there.

liamsi · 2021-06-14T21:15:30Z

docs/lazy-adr/adr-005-da-modules-and-components.md

+// The goal of this interface is to become general enough to later abstract over other DA schemas, e.g. Kate comms,
+// along with different networking schemas.


This is not a goal we currently pursue. While it doesn't harm and the interface looks simple enough we don't need to plan ahead for a case that won't cleary become relevant in the next months (maybe years).

I'll remove Kate comms form there.

adlerjohn · 2021-06-15T00:47:03Z

docs/lazy-adr/adr-005-da-modules-and-components.md

+}
+```
+
+// TODO: Add to specs as well


I would just make an issue in the specs referencing this PR and the parameter choices instead of having this TODO.

That's a self reminder to do so

adlerjohn · 2021-06-15T02:08:13Z

docs/lazy-adr/adr-005-da-modules-and-components.md

+### Package
+Data Availability related logic is place in `lazyledger-core/p2p/ipld`. The name and location was good for original
+purpose, as that's a p2p stuff that uses IPLD in it's core. However, currently it makes sense to generalize and rename
+the package to `da`(short for DataAvailability) or even move it in it's own repository, what could be done later with


nit

Suggested change

the package to `da`(short for DataAvailability) or even move it in it's own repository, what could be done later with

the package to `da`(short for DataAvailability) or even move it in its own repository, what could be done later with

Lastly, before making this PR ready, I will go through all the grammar and orthography.

adlerjohn · 2021-06-15T02:10:05Z

docs/lazy-adr/adr-005-da-modules-and-components.md

+Currently, `DAHeader` and block's `Data` are defined in `ll-core/types`. However, with modularity in mind and plan for
+DA to be used outside of `ll-core`, e.g. in `optimint`, it now make sense to move them to new `da` package.


+1 on moving out the DA logic. Also +1 on not moving out the Data especially, as it's purely a detail of LazyLedger. The DAHeader is a bit trickier, as it straddles both domains. Could the DAHeader not be present in both ll-core and da?

adlerjohn · 2021-06-15T02:12:23Z

docs/lazy-adr/adr-005-da-modules-and-components.md

+	// Put serializes Data, computes DAHeader and makes it available over the network.
+	Put(context.Context, *types.Block) (*types.DataAvailabilityHeader, error)
+
+	// Get efficiently retrieves block's Data using given DAHeader from the network.
+	Get(context.Context, *types.DataAvailabilityHeader) (*types.Data, error)


In line with the above comment, it seems like a better abstraction to only use a DA header here, and not deal with blocks or block data. Instead, leave the latter 2 to ll-core.

Yep, that's an early draft for the interface I had a few weeks ago. It's final form is still not clear for me

adlerjohn · 2021-06-15T02:13:21Z

docs/lazy-adr/adr-005-da-modules-and-components.md

+* `provideValidity=24h` - lifespan of a providing record.
+* `kBucketSize=20` - configures bucket size of the kademlia routing table.
+* `resiliency=3` - the number of peers closest to a target that must have responded in order for a given DHT query to
+  complete.
+* `alpha=32` - configures the number of concurrent requests for a given DHT query.
+* `protocolPrefix="/celestia"` - protocol prefix to distinguish our DHT network


Is there a rationale for parameter choices?

From what I can tell, these are mostly the current ipfs defaults (besides the protocolPrefix and I'm not sure about the resilience param). I'm sure Hlib just wrote them down to have something explicitly mentioned here to work with (which is good).

I also increased alpha

adlerjohn · 2021-06-15T02:19:50Z

docs/lazy-adr/adr-005-da-modules-and-components.md

+`Put` in push schema takes all the shares of an extended square, encodes every share along with a `(col;row)` coordinate
+together with `DataHash` and publishes them to the `/celestia` topic.


Chipping in here to note requiring 1/4 of the EDS (i.e. an ODS worth of shares) to be downloaded without verification will be a DoS vector. Not a RAM resource exhaustion, but a network bandwidth exhaustion.

@Wondertan when you say "and publishes them," are shares published individually, or an entire block? From what you wrote in this comment thread, it looks like individual shares, for which Merkle proofs are downloaded? Not sure.

liamsi · 2021-08-17T14:34:16Z

@Wondertan how do we want to proceed with this? Most of it is no longer relevant for this repo. If you want, we can still merge it as a draft to reference it in celestia-node later. Alternatively, we can close it and reconsider and finalize some of it as part of the lower level details in https://github.com/celestiaorg/celestia-node (I prefer the latter; feel free to reopen if you disagree).

Wondertan · 2021-08-18T11:41:31Z

I agree. Although I would like to finish that at some point, so let's keep the branch.

draft adr

85d6618

Wondertan self-assigned this Jun 14, 2021

adr: fix lines legnth

55775ea

liamsi reviewed Jun 14, 2021

View reviewed changes

adlerjohn reviewed Jun 15, 2021

View reviewed changes

Wondertan mentioned this pull request Jun 27, 2021

ADR 006: Block Propagation with Rows #434

Merged

liamsi closed this Aug 17, 2021

rootulp deleted the hlib/adr-da-module branch September 22, 2022 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adr: Data Availability Module and Components #414

adr: Data Availability Module and Components #414

Wondertan commented Jun 14, 2021

liamsi Jun 14, 2021

Wondertan Jun 14, 2021

liamsi Jun 14, 2021 •

edited

Loading

Wondertan Jun 14, 2021 •

edited

Loading

liamsi Jun 14, 2021 •

edited

Loading

adlerjohn Jun 15, 2021

Wondertan Jun 15, 2021 •

edited

Loading

Wondertan Jun 15, 2021

Wondertan Jun 15, 2021

Wondertan Jun 15, 2021 •

edited

Loading

liamsi Jun 14, 2021

adlerjohn Jun 15, 2021

Wondertan Jun 15, 2021

liamsi Jun 14, 2021

Wondertan Jun 14, 2021

adlerjohn Jun 15, 2021

Wondertan Jun 15, 2021

adlerjohn Jun 15, 2021

Wondertan Jun 15, 2021

adlerjohn Jun 15, 2021

adlerjohn Jun 15, 2021

Wondertan Jun 15, 2021

adlerjohn Jun 15, 2021

liamsi Jun 15, 2021 •

edited

Loading

Wondertan Jun 15, 2021

adlerjohn Jun 15, 2021

liamsi commented Aug 17, 2021

Wondertan commented Aug 18, 2021

		`Put` in push schema takes all the shares of an extended square, encodes every share along with a `(col;row)` coordinate
		together with `DataHash` and publishes them to the `/celestia` topic.

		Currently, `DAHeader` and block's `Data` are defined in `ll-core/types`. However, with modularity in mind and plan for
		DA to be used outside of `ll-core`, e.g. in `optimint`, it now make sense to move them to new `da` package.

		// The goal of this interface is to become general enough to later abstract over other DA schemas, e.g. Kate comms,
		// along with different networking schemas.

	the package to `da`(short for DataAvailability) or even move it in it's own repository, what could be done later with
	the package to `da`(short for DataAvailability) or even move it in its own repository, what could be done later with

adr: Data Availability Module and Components #414

adr: Data Availability Module and Components #414

Conversation

Wondertan commented Jun 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liamsi Jun 14, 2021 • edited Loading

Choose a reason for hiding this comment

Wondertan Jun 14, 2021 • edited Loading

Choose a reason for hiding this comment

liamsi Jun 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wondertan Jun 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wondertan Jun 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liamsi Jun 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liamsi commented Aug 17, 2021

Wondertan commented Aug 18, 2021

liamsi Jun 14, 2021 •

edited

Loading

Wondertan Jun 14, 2021 •

edited

Loading

liamsi Jun 14, 2021 •

edited

Loading

Wondertan Jun 15, 2021 •

edited

Loading

Wondertan Jun 15, 2021 •

edited

Loading

liamsi Jun 15, 2021 •

edited

Loading