Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ok CBOR, you win #110

Merged
merged 2 commits into from
Jun 4, 2022
Merged

Ok CBOR, you win #110

merged 2 commits into from
Jun 4, 2022

Conversation

scarmuega
Copy link
Member

@scarmuega scarmuega commented Jun 2, 2022

Cardano doesn't use a deterministic mechanism for CBOR encoding. The same block / tx data can be encoded in practically infinite variations, each one resulting in a different hash.

The process of decoding CBOR into a Rust struct involves loosing the information about these slight variations in encoding (eg: indefinite / definite arrays, int size, map orders, etc). So far, Pallas has been dealing with this problem by enriching the structures with extra data that allows to replicate the same variation when re-encoding a struct.

After dealing with several edge cases I thought I had won the battle. We actually managed to process the whole mainnet history without a mismatch. I felt a nice sense of accomplishment, I fought against the odds and survived...

... but I was wrong, CBOR had more tricks under the sleeve. A testnet Tx was found that had a mismatch in the resulting hash: txpipe/oura#307

This time, the problem was that a Tx input tuple (hash, index) was encoded as a CBOR indefinite array. Why on earth would someone encode a fixed tuple as an indefinite array? I don't know, but it was clear that this is not a war that can be won.

It is now 100% clear to me that the only way to maintain consistency on the hash generation process is to retain the original CBOR data, something that @NicolasDP has been telling me since quite a while now.

The question is how do we accomplish this without an impact on memory.

In this PR I introduce a generic structure called KeepRaw<T> that wraps an inner CBOR-encodable structure while tracking the start / end positions that represent the segment of the original bytestream relevant to that particular inner struct:

use pallas_codec::utils::KeepRaw;

let a = (123u16, (456u16, 789u16), 123u16);
let data = minicbor::to_vec(a).unwrap();

let (_, keeper, _): (u16, KeepRaw<(u16, u16)>, u16) = minicbor::decode(&data).unwrap();

// this returns a &[u8] slice of the total bytestream
keeper.raw_cbor()

In this way, we can start wrapping ledger primitives, allowing us to retain the original CBOR and hash accordingly:

impl ToHash<32> for KeepRaw<'_, TransactionBody> {
    fn to_hash(&self) -> pallas_crypto::hash::Hash<32> {
        Hasher::<256>::hash(self.raw_cbor())
    }
}

I'm not ashamed to say that CBOR has won, it was an honourable fight.

@scarmuega scarmuega marked this pull request as ready for review June 2, 2022 18:07
@scarmuega scarmuega requested a review from NicolasDP June 2, 2022 18:10
@scarmuega scarmuega merged commit 59a3ac3 into main Jun 4, 2022
@scarmuega scarmuega deleted the feat/keep-raw branch June 4, 2022 00:43
@NicolasDP
Copy link
Collaborator

Well at least you tried. It is unfortunate but this the in the end the easiest approach you can take for this. Also it makes more sense overall. If the user needs to verify the transaction's signatures or hashes, you need to use the data as present in the blockchain. I think the code will end up cleaner and safer overall.

This is not necessary to do just now but there is a very good CBOR library specifically designed to handle the use of CBOR in the context of Cardano: https://crates.io/crates/cbored We could check it out eventually and see if it is worth moving away from minicbor and the custom KeepRaw.

@scarmuega
Copy link
Member Author

oh, cbored is by @vincenthz, very nice!
I'll definitely give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants