-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to use SSZ for consensus only #129
Comments
Recapping, non-determinism (unspecified field ordering, overlong encodings allowed) and complexity of additional dependency were cited reasons for discarding it the first time around, while machine-readable schema and compactness of varint encoding were main positives. The data-type fit is also not that good, with no native types for fixed-length arrays (hashes, addresses) and unnecessary types like zigzags and floats adding surface area - this can be seen in the maximum value tests @paulhauner points to here: https://ethresear.ch/t/discussion-p2p-message-serialization-standard/2781/16
For nimbus, we'll end up writing custom decoders regardless of chosen format - generated protobuf code tends to be optimized for generic use cases and given data type mismatch will need a fair bit of "manual" processing anyway. No net simplicity gain, quite the opposite for the client itself (not considering 3rd party tooling). Wire encoding is an aspect of the protocol that's easy to change once the semantics nailed - thus we're also happy to delay a final decision for when more hard data is available. Good points on keeping the two concerns (network/consensus) separate, during the design phase at least. |
So I see two points here;
Distinct serialization schemesRegarding using distinct serialization schemes, I have the following points:
ProtobufHere are my reservations regarding using protobuf:
SummaryI can see there would be some gains in hashing speed by dropping length bytes from SSZ, but I'm uncertain as to how significant they are. It's my understanding that this spec targets simplicity over hardcore optimisation and my concern is that introducing a new p2p serialization format lies too far on the side of optimization. I'd like to see the following before I'd consider a switch to separate P2P and hashing serialization:
|
Considerations
Benefits
But introduces protobuf spec implementation (in the light of maintaining own protobuf impls).
As I understand it relates to faster hashing. Hashing algorithm mentioned above doesn't work with lengths prefixes and it, also, does not create copies of object's data.
New hashing algorithm uses paddings but they don't affect SSZ anyhow.
This is a real thing. There should be a bunch of tests for SSZ including those that check overflows which are effectively discovered by fuzz testing. But in case of custom protobug implementation we're getting to the same problem here as with SSZ.
SSZ serializes objects in a pretty deterministic way without schema definitions which IMO is enough for eth 2.0 needs. Versioning is a good thing, SSZ is not forward compatible in terms of schema update. SummaryAs a p2p serialization format SSZ lacks forward compatibility in message schemas. It could become a real problem in the future cause even after several versions of PoC there might occur a need of adding something to block, attestation or whatever other messages. A good example is SSZ spec is pretty simple and with good test coverage it should be easy implementable with high level of reliability. And in my opinion it's important that the spec is maintained by Ethereum community. |
Is this advocating that data structures like blocks should be decomposed and then re-serialized using protobuf instead of SSZ over the wire? If so, that seems like a large increase in complexity... What's wrong with the current approach of sending consensus objects around in the same byte format in which they appear in consensus? The network protocol that's used to wrap the objects could still be protobuf or whatever else if desired.
It's tiny. The main hashing cost is state recalculations, and in this case, account objects are fixed size and lists tend to be very long so the marginal contribution of length prefixes is very low. In general, sacrificing the property that every object has exactly one way to encode it as bytes and going down this path reeks of sacrificing essential simplicity for a little temporary over-optimization.... |
Thanks for the feedback everyone!
Is it important that the network sends consensus objects in the same format that they're hashed in? I was arguing that it isn't, since the clients themselves should be handling the consensus objects in some data structure, not the encoding. Its handy when monitoring the network, and there may be other important reasons that I'm missing.
I agree that this is a risk. There's always the chance that we want a feature that isn't provided, and we're forced to hack around it or fork. A big concern I'm hearing is that protobuf's surface area is too large for our use case. I kinda agree, and we would have to do a more thorough audit if we want to adopt protobuf (or another 3rd party library), but I would also argue that there's less risk from a security perspective, since its not used for consensus. The second concern I see is that the benefits I listed above aren't enough to justify switching over from ssz to protobuf. I can only speak about Go, where an implementation of ssz will be slower and harder to use/debug without significant effort. But part of the reason for this is because Golang lacks generics, which isn't relevant to other languages, and maybe I should keep this to myself :) I'm curious how protobuf would compare to ssz in Rust, so I may try that at some point. The mood that I'm feeling in this thread is that everyone's okay with using ssz for both consensus and the network. If so, I'm okay with maintaining the status quo. However, my concern is that ssz continues to be used for all encoding/decoding in Ethereum 2.0, including non-consensus objects, and at some points it stops living up to the "simple serializer" name. |
Google can't force you to update protobuf version and you are free to fork it at any time. Git based projects are as decentralized as you would like them to be and I think this a poor argument against a technology. To take this to the extreme would mean writing everything from the ground up. Go is also maintained by Google and subject to change at any time, but it doesn't mean using it goes against the virtues of decentralized technology. I agree that there needs to be a clear argument for using protobuf over SSZ, and I would be interested to see benchmarks between the two strategies at scale. Going forward, Prysm clients may support both wire protocols and prefer protobuf for |
DELETED
|
I realise it can be forked, but developing and maintaining a succinct, application specific component could be much less effort than forking and maintaining a much larger, generalised component.
I mentioned this point was philosophical and I think it’s reasonable that a philosophy becomes nonsensical when taken to the extreme. I was trying to avoid having external components baked into the spec, not avoiding using external components in the implementation at all costs. Regardless of all of this, it seems that protobuf is baked into the libp2p spec so my objections to it are futile. Paul: 0, Google: 1. I don’t want to end up bikeshedding about libp2p so I’m happy to accept protobuf as a cool piece of tech that we’re going to use. I honestly do think it’s cool and would use it without hesitation in other scenarios. I’m still open for technical discussions on why we shouldn’t push consensus-serialised objects around the network, if more information comes to light :) |
Speaking from experience, reimplementing the wire encoding part of protobuf - which is what you actually need to work with protobuf-encoded data ("minimal protobuf") - is trivial. It's comparable to ssz in terms of complexity. I wouldn't worry too much about it from an "3rd party libraries are bad" perspective. The fact that libp2p uses protobuf is not a reason to put it in the application protocol, just like you wouldn't consider the TCP packed binary fields to be a constraint on the design. Protobuf being a key-value format, and not supporting fixed-lengths arrays, are solid arguments not to go with it however - you will need to perform "encoding" and "decoding" steps to get from consensus data to protobuf-typed data ("hashes are encoded as variable-length byte buffers" is an encoding step!), and then do the protobuf wire encoding. I like that prysmatic, that is suggesting the feature, will be doing both - then we can get some solid numbers and make an informed choice. Until then, the numbers from Lighthouses simulations speak in SSZ's favor, as do other arguments. |
Awesome, I would be keen to do this. |
http://jpa.kapsi.fi/nanopb/ is a fairly easy role model implementation, though even that one strives to be more "generic" than you have to be, if you really want to specialize |
It's worth noting that if we go this way, then the only part of SSZ we would actually be using is the tree hashing, which is actually a fairly small and compact part of the protocol. |
It would be great to see protobufs used for the network layer. @rawfalafel does your proposal include gRPC too? When you throw gRPC into the mix, it's trivial move beyond simple RPC and have streaming services between components backed by HTTP/2, which is another significant value-add. I've seen countless developer hours wasted on getting different network serialisation protocols to work between services. Protobufs provide a concise, straight forwards interface language with great cross-platform support. I've seen multiple projects switch to using it and never look back. It will also save a lot of time for those of us developing and maintaining Ethereum integration libraries as we transition to eth2.0 support ... 😉 |
We would still need an SSZ encoder to go from language-specific-object to bytes though, right? Protobuf as a whole makes no guarantees about byte level determinism, however there seem to be some people who achieved it in a "hackish" way (their words, not mine). Maybe there's an elegant way? @conor10 I think everyone agrees about using established protocol (e.g., protobufs) for talking between "internal" components, like beacon-node to validator-client. As I understand it, this discussion is around whether p2p nodes should send block, attestations, etc to each other as SSZ or protobuf. If protobuf can elegantly replace SSZ I'm not opposed to doing so. I'm ditching all my "philosophical" concerns after having a look at minimal protobuf. Seems pretty easy to maintain. |
Good point. Protobuf not supporting fixed length byte arrays is pretty annoying.
Prysmatic plans to use gRPC for internal communication, but I don't know where I stand about making it a standard. |
I understand the consensus (pun unintended) is to use SSZ for both consensus and network. Feel free to reopen :) |
Instead of using ssz for both network and consensus, I think we should use ssz only for consensus and standardize around protobuf at the network layer.
Benefits
Potential conerns
Previous discussions
The text was updated successfully, but these errors were encountered: