-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash of ProtectedStoragePayload is non-deterministic #3367
Comments
Oh damn.... |
I witnessed the incoming |
Here is a collection of issues with non-determinisms in protobuf serialization (its not just maps that are a problem) https://gist.github.com/kchristidis/39c8b310fd9da43d515c4394c3cd9510 Another useful article: |
did you see this happen in the java app or is it a theoretical problem? I'm pretty sure that the ordering in java hashmaps is always the same even if it is not defined. and since we always use the same code it could very well be that this is not really resulting in a problem right now. we could also write code that detects this problem and removes the duplicate items. |
Its not theoretical, I have witnessed this problem in my rust re-implementation (consuming data from mainnet seednodes). I have not yet verified 100% conclusively if this happens in the java app and am currently investigating to find evidence. But regardless - the order of the entries in a java hashmap is also not guaranteed to stay stable. Even if it is working now any of the following things (and perhaps more) could cause the bug to surface:
It seems to me like it is very risky to keep this bug around. Especially given the reliance on hashes and the usage of maps in the consensus code of the DAO. |
I'm not saying that we shouldn't fix it. I was just wondering if this happens in the wild, and I think it is not so certain that the answer is yes. |
from the gist that you linked to: I guess the gist of the problem is that we are using protobuf for something that its not intended for. maybe we can create our own way of hashing the protobuf objects. how would we maintain backward compatibility if we change the hashing? maybe if what the java library does now is stable we can create our own hashing that creates the same hashes. @chimp1984 wdyt? |
I now have proof that this is happening at a non-trivial frequency on the currently live mainnet, independently of my experimentation with rust. I have compared the serialized bytes for given In the time frame I observed out of 52 |
Yes to fork protobuf and make the map determinisitic would be one option to deal with it. @bodymindarts |
@chimp1984 I have run another test this time not only comparing offer id but the entire payload:
The result in the time I observed: |
Does rust automatically generate equals methods or is there a possible bug there? |
As a first step. we could add that check to the java version and log it. |
@christophsturm Equality in this case is derived automatically for the protobuf related |
Just FYI this is trickier than it seems as you need the |
I tested in OfferBookService if there are duplicates and I could not find any so far. I will leave it running with 2 apps over night to see if any appear. I also checked DaoState hash in 1000 iterations and all hashes are the same. I am aware that this is not guarantee and we should fix it, but at least we should know if it is an acute issue or not. Here is the code I used for checking for duplicates in OfferBookService:
|
@chimp1984 your code is not checking the bytes as they came from the network. Only the re-serialized versions of the bytes that were created locally on your node. Your code doesn't show that different instances behave identically, just that your local node (probably) behaves deterministically. |
@chimp1984 If you want to test wether or not all instances are serializing the same way you must cache the bytes that came in from the network. Then you know how they were serialized by the peer that sent you the message (and thus also wether or not that peer is coming up with the same hash or not). (EDIT: deleted idea that did not work) |
I have now commited code that makes the problem visible in java https://github.com/bodymindarts/bisq/tree/bad-bytes-repro.
The output comes from this section of code. Though there is other code added to make this work (best to review the diff). This proves that, while locally serialization may be deterministic - globally it is not, indicating that different nodes must be coming up with different hashes in some cases (since the serialized bytes are the bases for the hash). Or is there another explanation? |
Since I think there is some confusion about what the problem really is (and the way it manifests itself in the java code is a bit different than in my rust code) here is a high level description of a bug that can result from this issue: Alice creates Offer A -> serializes to aliceBytes -> creates aliceHash from aliceBytes (for storing and referencing) -> then broadcasts aliceBytes to the network Bob receives aliceBytes -> deserializes to Offer A -> re-serializes to bobsBytes and creates bobsHash from bobsBytes for storing so the step Offer -> bytes results in 2 different hashes on Alice and Bobs respective machine (this is the non-deterministic aspect of the serialization at play). So far they each have Offer A in their Offer Book but the hashes are different on Alice and Bobs machine. Now the offer is about to expire so Alice creates There are likely other bugs that result from this issue, this is just the most obvious I can think of. |
Thanks for sharing your code version. I tested with that to see how many I also counted the number of OfferPayloads and it was the same number (308) as the displayed offers, so it seems there are no duplicates entry in the map. I could verify that the bytes from the network (according your code version) are different in many cases (number of accumulated conflicts goes to 180 in my test) but I am not clear yet why that would not reflect in more conflicts with |
Duplication in the map does not occur in Java because locally the hashing is deterministic (at least it seems to be). The issue only shows up in the interaction between nodes because globally its non-deterministic. In rust this is different. Its non-deterministic even locally which leads to the duplication which is how I spotted it. This is because the order of entries in rust's HashMap implementation is randomized by design for security reasons. It makes a certain category of attacks much harder. |
This is interesting. I would also expect more conflicts given the volume of messages with different bytes. |
Here is my code: |
You use |
No that isn't the issue. I use |
Still no explaination why it does not cause many of failed |
Just a thought: The original bytes we receive are in the way they were serialized from the last hop that the message takes towards us. So just because we receive them in a way that indicates a different hash in the sender, doesn't mean that In other words: It is possible that aliceBytes == dansBytes (so Difference in bytes => indication that a direct peer serializes differently than us ... thinking further, with just 1 peer that serializes differently we will regularly receive bytes that differ from our serialization. But that doesn't say anything about the number of bad |
@chimp1984 is it possible to identify when this happens for that reason? Could your test measure how often it happens for that reason vs. some other reason. |
Yes, we get offers also repeated due the floodfill behaviour. But I still have some doubts that a relative high number of byte mismatches is not reflected in the RefreshOfferMessages.
One would need to cache the RemoveDataMessages and look up if we get a RefreshOfferMessage of an Offer which has been removed. Beside that it is always possible that a node has never received the offer in the first place but gets later the RefreshOfferMessage. |
I have run some more tests to check where in the byte array the information of the payload sits. My output looks like:
So the bytes that come in different off the wire:
Don't appear to be in positions that effect the payload. Summary of my current assumptions:
If these are all true then there may not currently be an acute problem in Java... though it may surface in the future if things change. Things to investigate:
|
I have applied the technique described in this article to switch out the maps into arrays like so:
becomes:
The result is both backwards compatible and currently deterministic for both java and rust. I will be using that as the bases for messages in my rust implementation going forward. If there is interest I'm happy to invest the work to make the required changes in Java as well. Please comment if you think this makes sense! |
An important detail of the pb library is that internally it is using This means it preserves the initial order of the keys in the map across multiple de-serialization -> re-serialization passes thus enabling the hashes created in Bisq to be stable across nodes. If this implementation detail would ever change hashes computed on different nodes may no longer line up. |
Closing this issue in favor of summary #3391 |
In P2PDataStorage.java there is a hash map that should contain the up to date view of the data that gets gossiped across the network.
The keys to this map are generated from the hash of the protobuf serialized
ProtectedStoragePayload
.Unfortunately using this hash as a key is not a good choice because it can differ for the same payload depending on how it gets serialized. Protobuf serialization is non-deterministic when it comes to maps.
And many payloads contain maps via the extra_data field.
This leads to payloads which should be considered unique ending up in the map multiple times. For example I have witnessed
OfferPayload
s that have the sameid
inserted multiple times due to differing hashes due to different ordering of the map entries during serialization.This issue can have a lot of ripple effects since there are payloads that contain references to hashes of other payloads that even get signed to prove authorship. Like with the
RefreshOfferMessage
for example.It is possible for Alice to submit a
RefreshOfferMessage
that is correct from her POV but Bob can not validate it because his hash of the originalOfferPayload
is different.The text was updated successfully, but these errors were encountered: