Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash of ProtectedStoragePayload is non-deterministic #3367

Closed
bodymindarts opened this issue Oct 6, 2019 · 33 comments
Closed

Hash of ProtectedStoragePayload is non-deterministic #3367

bodymindarts opened this issue Oct 6, 2019 · 33 comments

Comments

@bodymindarts
Copy link

bodymindarts commented Oct 6, 2019

In P2PDataStorage.java there is a hash map that should contain the up to date view of the data that gets gossiped across the network.

The keys to this map are generated from the hash of the protobuf serialized ProtectedStoragePayload.

Unfortunately using this hash as a key is not a good choice because it can differ for the same payload depending on how it gets serialized. Protobuf serialization is non-deterministic when it comes to maps.

Wire format ordering and map iteration ordering of map values is undefined, so you cannot rely on your map items being in a particular order.

And many payloads contain maps via the extra_data field.

This leads to payloads which should be considered unique ending up in the map multiple times. For example I have witnessed OfferPayloads that have the same id inserted multiple times due to differing hashes due to different ordering of the map entries during serialization.

This issue can have a lot of ripple effects since there are payloads that contain references to hashes of other payloads that even get signed to prove authorship. Like with the RefreshOfferMessage for example.
It is possible for Alice to submit a RefreshOfferMessage that is correct from her POV but Bob can not validate it because his hash of the original OfferPayload is different.

@bodymindarts bodymindarts changed the title Hash of ProtectedStoragePayload is non-deterministic (leading to duplication of elements in p2pstorage) Hash of ProtectedStoragePayload is non-deterministic (leading to duplication of elements in P2PDataStorage) Oct 6, 2019
@chimp1984
Copy link
Contributor

Oh damn....
The 'extra_data field' is not used in many payloads but in 'OfferPayload' and 'TradeStatistics2' it is used.
Did you see a different hash in your Rust project? It might be that the issue becomes only relevant if protobuf is used across different languages.
We need to find a way how to fix that without breaking backward compatibility. There are also maps used in 'DaoState' and in 'Preferences'.

@bodymindarts
Copy link
Author

bodymindarts commented Oct 7, 2019

I witnessed the incoming OfferPayloads that were being consumed from my rust project generating different hashes. I have not investigated wether the issue is the byte order that the messages were received with (meaning that they are different coming from different java clients) or the re-serialization (for hashing) within rust.
Regardless Java std lib also doesn't garuantee ordering in its HashMap implementation and is likely to change from version to version or depend on which OS / JVM implementation its being run. So even without direct evidence its prudent to assume this could happen in a java only world as well.

@bodymindarts
Copy link
Author

bodymindarts commented Oct 7, 2019

Here is a collection of issues with non-determinisms in protobuf serialization (its not just maps that are a problem) https://gist.github.com/kchristidis/39c8b310fd9da43d515c4394c3cd9510

Another useful article:
https://havoc.io/post/deterministic-protobuf/

@christophsturm
Copy link
Contributor

did you see this happen in the java app or is it a theoretical problem? I'm pretty sure that the ordering in java hashmaps is always the same even if it is not defined. and since we always use the same code it could very well be that this is not really resulting in a problem right now.

we could also write code that detects this problem and removes the duplicate items.

@bodymindarts
Copy link
Author

bodymindarts commented Oct 7, 2019

Its not theoretical, I have witnessed this problem in my rust re-implementation (consuming data from mainnet seednodes). I have not yet verified 100% conclusively if this happens in the java app and am currently investigating to find evidence.

But regardless - the order of the entries in a java hashmap is also not guaranteed to stay stable. Even if it is working now any of the following things (and perhaps more) could cause the bug to surface:

  • update of the java version
  • different JVM implementation (eg. for different OSes)
  • update to the google pb library

It seems to me like it is very risky to keep this bug around. Especially given the reliance on hashes and the usage of maps in the consensus code of the DAO.

@christophsturm
Copy link
Contributor

I'm not saying that we shouldn't fix it. I was just wondering if this happens in the wild, and I think it is not so certain that the answer is yes.

@christophsturm
Copy link
Contributor

from the gist that you linked to:
"While I doubt that any official Google implementation would ever change the serialization, third party implementations may do whatever they like"

I guess the gist of the problem is that we are using protobuf for something that its not intended for. maybe we can create our own way of hashing the protobuf objects. how would we maintain backward compatibility if we change the hashing? maybe if what the java library does now is stable we can create our own hashing that creates the same hashes. @chimp1984 wdyt?

@bodymindarts
Copy link
Author

bodymindarts commented Oct 7, 2019

I now have proof that this is happening at a non-trivial frequency on the currently live mainnet, independently of my experimentation with rust.

I have compared the serialized bytes for given OfferPayloads with different hashes but same id exactly as they were received from the network (this means they were serialized by a java process, not re-serialized in rust code).

In the time frame I observed out of 52 AddDataMessage that contained OfferPayloads with unique hashes 13 were actually duplicates with the same id. In each case there were only a few bytes swapped which were likely the result of the order of entries in the extra_data field.

@chimp1984
Copy link
Contributor

maybe if what the java library does now is stable we can create our own hashing that creates the same hashes. @chimp1984 wdyt?

Yes to fork protobuf and make the map determinisitic would be one option to deal with it.

@bodymindarts
Are you 100% sure your tests are correct and not caused by other changed data (like edited offers - they keep the same id but have diff. price)?

@bodymindarts
Copy link
Author

@chimp1984 I have run another test this time not only comparing offer id but the entire payload:

if &bytes != old_bytes && &offer.payload == old_payload

The result in the time I observed:
17 out of 79 messages passed that conditional.

@christophsturm
Copy link
Contributor

Does rust automatically generate equals methods or is there a possible bug there?

@christophsturm
Copy link
Contributor

As a first step. we could add that check to the java version and log it.

@bodymindarts
Copy link
Author

bodymindarts commented Oct 8, 2019

Does rust automatically generate equals methods or is there a possible bug there?

@christophsturm Equality in this case is derived automatically for the protobuf related structs.

@bodymindarts
Copy link
Author

bodymindarts commented Oct 8, 2019

As a first step. we could add that check to the java version and log it.

Just FYI this is trickier than it seems as you need the bytes != old_bytes check to operate on the exact bytes that came from the network before the object gets inflated via from_proto. If you just re-serialize the objects the bytes probably wont differ since locally the serialization may be deterministic. (Though in rust even serialized locally they differ this may not be the case in Java).

@chimp1984
Copy link
Contributor

I tested in OfferBookService if there are duplicates and I could not find any so far. I will leave it running with 2 apps over night to see if any appear.

I also checked DaoState hash in 1000 iterations and all hashes are the same.

I am aware that this is not guarantee and we should fix it, but at least we should know if it is an acute issue or not.

Here is the code I used for checking for duplicates in OfferBookService:

 public void onAdded(ProtectedStorageEntry data) {
                if (data.getProtectedStoragePayload() instanceof OfferPayload) {
                    OfferPayload offerPayload = (OfferPayload) data.getProtectedStoragePayload();
                    byte[] hashOfPayload = P2PDataStorage.get32ByteHash(offerPayload);
                    totalOffers = 0;
                    duplicates = 0;
                    p2PService.getDataMap().forEach((key, value) -> {
                        if (value.getProtectedStoragePayload() instanceof OfferPayload) {
                            totalOffers++;
                            OfferPayload offerPayload2 = (OfferPayload) value.getProtectedStoragePayload();
                            byte[] hashOfPayload2 = P2PDataStorage.get32ByteHash(offerPayload2);
                            checkArgument(Arrays.equals(key.bytes, hashOfPayload2), "bytes must be same");
                            if (offerPayload2.getId().equals(offerPayload.getId()) && !Arrays.equals(hashOfPayload, hashOfPayload2)) {
                                duplicates++;
                                log.info("Duplicate \nofferPayload1={}\nhashOfPayload1={}\n" +
                                                "offerPayload2={}\nhashOfPayload2={}", offerPayload, Hex.encode(hashOfPayload),
                                        offerPayload2, Hex.encode(hashOfPayload2));
                            }
                        }
                    });
                    log.info("totalOffers: {}, duplicates={} ", totalOffers, duplicates);
                }


                offerBookChangedListeners.stream().forEach(listener -> {
                    if (data.getProtectedStoragePayload() instanceof OfferPayload) {
                        OfferPayload offerPayload = (OfferPayload) data.getProtectedStoragePayload();
                        Offer offer = new Offer(offerPayload);
                        if (showOffer(offer)) {
                            offer.setPriceFeedService(priceFeedService);
                            listener.onAdded(offer);
                        }
                    }
                });
            }

@bodymindarts
Copy link
Author

bodymindarts commented Oct 9, 2019

@chimp1984 your code is not checking the bytes as they came from the network. Only the re-serialized versions of the bytes that were created locally on your node. Your code doesn't show that different instances behave identically, just that your local node (probably) behaves deterministically.

@bodymindarts
Copy link
Author

bodymindarts commented Oct 9, 2019

@chimp1984 If you want to test wether or not all instances are serializing the same way you must cache the bytes that came in from the network. Then you know how they were serialized by the peer that sent you the message (and thus also wether or not that peer is coming up with the same hash or not).

(EDIT: deleted idea that did not work)

@bodymindarts
Copy link
Author

bodymindarts commented Oct 9, 2019

I have now commited code that makes the problem visible in java https://github.com/bodymindarts/bisq/tree/bad-bytes-repro.
Build + run it in mainnet. Eventually you should see an output:

WARN  b.n.p2p.storage.P2PDataStorage: REPRO - BYTES OF LAST MESSAGE != BYTES OF THIS MESSAGE -> different serialization but same hash
WARN  b.n.p2p.storage.P2PDataStorage: Bytes differ in pos 1775 of 1781
WARN  b.n.p2p.storage.P2PDataStorage: Bytes differ in pos 1776 of 1781

The output comes from this section of code. Though there is other code added to make this work (best to review the diff).

This proves that, while locally serialization may be deterministic - globally it is not, indicating that different nodes must be coming up with different hashes in some cases (since the serialized bytes are the bases for the hash).

Or is there another explanation?

@bodymindarts bodymindarts changed the title Hash of ProtectedStoragePayload is non-deterministic (leading to duplication of elements in P2PDataStorage) Hash of ProtectedStoragePayload is non-deterministic Oct 9, 2019
@bodymindarts
Copy link
Author

bodymindarts commented Oct 9, 2019

Since I think there is some confusion about what the problem really is (and the way it manifests itself in the java code is a bit different than in my rust code) here is a high level description of a bug that can result from this issue:

Alice creates Offer A -> serializes to aliceBytes -> creates aliceHash from aliceBytes (for storing and referencing) -> then broadcasts aliceBytes to the network

Bob receives aliceBytes -> deserializes to Offer A -> re-serializes to bobsBytes and creates bobsHash from bobsBytes for storing

so the step Offer -> bytes results in 2 different hashes on Alice and Bobs respective machine (this is the non-deterministic aspect of the serialization at play).

So far they each have Offer A in their Offer Book but the hashes are different on Alice and Bobs machine.

Now the offer is about to expire so Alice creates RefreshOfferMessage referencing aliceHash as the payload hash.
Bob receives RefreshOfferMessage and looks up the payloadHash (= aliceHash) but doesn't find the matching offer (as he has it stored under bobsHash). So for Bob the offer will expire even though Alice wants to refresh the TTL.

There are likely other bugs that result from this issue, this is just the most obvious I can think of.

@chimp1984
Copy link
Contributor

@bodymindarts

Thanks for sharing your code version.

I tested with that to see how many RefreshOfferMessage would have conflicts (e.g. you receive one but it has no corresponding entry with that hash in the map). It happend 1 times in a test run of about 1 hour. I will leave it running longer, but a low number of such mismatches is expected as it can be that the offer was removed before you receive a refresh message due network delays.

I also counted the number of OfferPayloads and it was the same number (308) as the displayed offers, so it seems there are no duplicates entry in the map.

I could verify that the bytes from the network (according your code version) are different in many cases (number of accumulated conflicts goes to 180 in my test) but I am not clear yet why that would not reflect in more conflicts with RefreshOfferMessage.

@bodymindarts
Copy link
Author

bodymindarts commented Oct 9, 2019

I also counted the number of OfferPayloads and it was the same number (308) as the displayed offers, so it seems there are no duplicates entry in the map.

Duplication in the map does not occur in Java because locally the hashing is deterministic (at least it seems to be). The issue only shows up in the interaction between nodes because globally its non-deterministic.

In rust this is different. Its non-deterministic even locally which leads to the duplication which is how I spotted it. This is because the order of entries in rust's HashMap implementation is randomized by design for security reasons. It makes a certain category of attacks much harder.

@bodymindarts
Copy link
Author

I could verify that the bytes from the network (according your code version) are different in many cases (number of accumulated conflicts goes to 180 in my test) but I am not clear yet why that would not reflect in more conflicts with RefreshOfferMessage.

This is interesting. I would also expect more conflicts given the volume of messages with different bytes.

@chimp1984
Copy link
Contributor

@chimp1984
Copy link
Contributor

You use protobuf.NetworkEnvelope.parseFrom(bytes) but we used originally parseDelimitedFrom (and writeDelimitedTo for writing). Maybe that causes some issues with the differences? The diffs are always the last 5-6 bytes.
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Parser.html#parseDelimitedFrom-java.io.InputStream-

@bodymindarts
Copy link
Author

bodymindarts commented Oct 9, 2019

No that isn't the issue. I use parseFrom because I have already parsed the size (I need it to know how big the byte array should be). The parseDelimitedFrom just does the size parsing inline. Either way the method toBytesArray (used for hashing) yields the version without the size delimination. So I haven't added the bytes that encode the size to the originalBytes array. Otherwise the length of the arrays would be different.
That the last couple of bytes are different are an indication that its the extra_fields map that is the issue (its the 2nd to last field in the serialization). If the size parsing was an issue it would show up in the very first bytes.

@chimp1984
Copy link
Contributor

Still no explaination why it does not cause many of failed RefreshOfferMessage processing.

@bodymindarts
Copy link
Author

bodymindarts commented Oct 9, 2019

Still no explaination why it does not cause many of failed RefreshOfferMessage processing.

Just a thought: The original bytes we receive are in the way they were serialized from the last hop that the message takes towards us. So just because we receive them in a way that indicates a different hash in the sender, doesn't mean that RefreshOfferMessage has to be wrong because the sender we received the message from probably != the person who opened the offer.

In other words:
Alice opens Offer A -> aliceBytes -> Bob receives aliceBytes to Offer A and re-serializes (in order to relay) as bobsBytes -> Dan receives bobsBytes again deserializes to Offer A and stores it under hash(dansBytes).

It is possible that aliceBytes == dansBytes (so RefreshOfferMessage will look right) even though bobsBytes != dansBytes (so the bytes we receive would actually hash wrongly).
This plays a role in explaining why number of differences in bytes != bad RefreshOfferMessages.

Difference in bytes => indication that a direct peer serializes differently than us
Bad RefreshOfferMessage => indication that Offer origin serializes differently than us

... thinking further, with just 1 peer that serializes differently we will regularly receive bytes that differ from our serialization. But that doesn't say anything about the number of bad RefreshOfferMessages we should receive.

@bodymindarts
Copy link
Author

but a low number of such mismatches is expected as it can be that the offer was removed before you receive a refresh message due network delays.

@chimp1984 is it possible to identify when this happens for that reason? Could your test measure how often it happens for that reason vs. some other reason.

@chimp1984
Copy link
Contributor

Yes, we get offers also repeated due the floodfill behaviour. But I still have some doubts that a relative high number of byte mismatches is not reflected in the RefreshOfferMessages.

is it possible to identify when this happens for that reason?

One would need to cache the RemoveDataMessages and look up if we get a RefreshOfferMessage of an Offer which has been removed. Beside that it is always possible that a node has never received the offer in the first place but gets later the RefreshOfferMessage.

@bodymindarts
Copy link
Author

bodymindarts commented Oct 10, 2019

I have run some more tests to check where in the byte array the information of the payload sits. My output looks like:

Payload bytes exists in network bytes from pos: 17-1289 of 1792
Payload bytes exists in network bytes from pos: 17-1286 of 1789
(...)

So the bytes that come in different off the wire:

WARN  b.n.p2p.storage.P2PDataStorage: REPRO - BYTES OF LAST MESSAGE != BYTES OF THIS MESSAGE -> different serialization but same hash
WARN  b.n.p2p.storage.P2PDataStorage: Bytes differ in pos 1775 of 1781
WARN  b.n.p2p.storage.P2PDataStorage: Bytes differ in pos 1776 of 1781

Don't appear to be in positions that effect the payload.

Summary of my current assumptions:

  • both java and rust protobuf serialization are non-deterministic when looking at serialization on different nodes
  • in java the payload serialization doesn't seem to be effected by the non-determinism (even though other parts - toward the end of the message - are)
  • rust serialization is also non-deterministic locally within 1 node and the payload is effected (this is by design for security reasons)

If these are all true then there may not currently be an acute problem in Java... though it may surface in the future if things change.

Things to investigate:

  • what do the bytes that are different even in java signify, can issues result from this?
  • are the serialization of other payloads also stable? Probably yes, but might be good to test it by looking at the bytes.

@bodymindarts
Copy link
Author

I have applied the technique described in this article to switch out the maps into arrays like so:

message TempProposalPayload {
    Proposal proposal = 1;
    bytes owner_pub_key_encoded = 2;
    map<string, string> extra_data = 3;

becomes:

message TempProposalPayload {¬
    Proposal proposal = 1;
    bytes owner_pub_key_encoded = 2;
    repeated StringKV extra_data = 3;
}

message StringKV {
    string key = 1;
    string value = 2;
}

The result is both backwards compatible and currently deterministic for both java and rust.
There is no guarantee that this will not break in the future and relies on current implementation details of the individual libraries.
In other words this is a risky work-around that could blow up in the future but seems okay for the time beeing.

I will be using that as the bases for messages in my rust implementation going forward.

If there is interest I'm happy to invest the work to make the required changes in Java as well. Please comment if you think this makes sense!

@bodymindarts
Copy link
Author

An important detail of the pb library is that internally it is using LinkedHashMap when deserializing a map field.

This means it preserves the initial order of the keys in the map across multiple de-serialization -> re-serialization passes thus enabling the hashes created in Bisq to be stable across nodes. If this implementation detail would ever change hashes computed on different nodes may no longer line up.

@bodymindarts
Copy link
Author

Closing this issue in favor of summary #3391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants