Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of serde traits? #310

Closed
newpavlov opened this issue Sep 6, 2021 · 8 comments
Closed

Implementation of serde traits? #310

newpavlov opened this issue Sep 6, 2021 · 8 comments

Comments

@newpavlov
Copy link
Member

newpavlov commented Sep 6, 2021

A way to serialize state of the Sm3 hash function was requested in #304. Do we want to implement the serde traits for hash functions (behind an optional feature)? It may be useful in some cases for transferring state for partially hashed messages.

Reasons against it:

  • Hash functions generally work on blocks, thus serialized states can include unprocessed parts of a potentially sensitive message.
  • Implementation of the serde traits somewhat restricts our freedom of changing inner representation of hash functions. Currently all functions have the same state representation regardless of target architecture or available CPU features, but it may change in future.
  • It does not look like state serialization of hash functions is a common enough problem.

Currently I am leaning towards the "against" position, but I think it's worth to discuss it nevertheless.

@tarcieri
Copy link
Member

tarcieri commented Sep 8, 2021

Perhaps before serde we could implement a simpler state import/export (e.g. SerializableState), and then implement the serde support in terms of that.

@waynr
Copy link

waynr commented Apr 24, 2022

The lack of serializable/deserializable state is enough of a blocker for me at this point that I'd be willing to take a stab at this with some guidance. It's been recommended that I consider using the compress functions directly but there are some complications to this for me given my lack of cryptographic knowledge and the extensive use of various traits that make it hard to understand how to properly finalize the output:

  • i have to break the data-to-be-hashed into hashable chunks using &[GenericArray<u8, U64>]
  • because the intermediate state for a given hash algorithm can apparently only be compressed in chunks of fixed size and i will need to serialize/deserialize multiple times in the course of calculating the digest for a given data stream (which may be multiple GB in size and broken down into MB-sized chunks), it looks like i will need to keep track of both already-compressed state and partial blocks between chunks of the data stream i am digesting

It looks like i would end up implementing my own janky half-assed versions of sha2::Sha256 and sha2::Sha512 anyway.

To get me started, I would have to know at which level to implement SerializableState -- ie, should that be included as a bound in VariableOutputCore? And implemented on the various variable cores, eg Sha256VarCore, Sha512VarCore. Would a SerializeableState need to be bound by or have an associated type of BlockSize?

@waynr
Copy link

waynr commented Apr 25, 2022

Would a SerializeableState need to be bound by or have an associated type of BlockSize?

I just realized that the block isn't directly related to the state array size. It may be best just to have something like

pub trait SerializableState {
  fn serialize_state(&mut self) -> &[u8];
  fn from_serialized_state(&[u8]) -> Self;
}

And just leave slice correctness checking up to the implementation. Or maybe there's a better way to do it with GenericArray. Another possibility would be to implement something like BlockSizeUser but call it StateSizeUser and set that as a bound on SerializableState:

pub trait SerializableState: StateSizeUser
where
   StateSize: IsLess<U256>,
{
  fn serialize_state(&mut self) -> State<T>;
  fn from_serialized_state(State<T>) -> Self;
}

Though things get a little fuzzy for me when trying to wrap my head around State<T>, IsLess<U256>, etc. The reason I suggest something like State<T> is in analogy to Block<T>, which I see in various places. It looks to me like Block<T> is how blocks are expressed generically where the T parameter, I think, is a type representing the size of the block since that's the main thing that seems to vary between different block-centric algorithms. In a similar fashion, I propose State<T> because the internal state, at least of Sha256 and Sha512 differ in array length.

Though come to think of it they also differ in integer size so it might have to be State<T, V> where T is an element size parameter and V is an array size parameter. That's assuming all these fancy algorithms have states that can be represented simply as an array of integers. It may be that other algorithms than the ones I am familiar with have a more complex state representation. In this case it may be difficult to capture the full variety of state types with a single generic type.

@tarcieri
Copy link
Member

tarcieri commented Apr 25, 2022

fn serialize_state(&mut self) -> &[u8];

This assumes the state is internally represented contiguously as bytes, which may not be the case.

Or maybe there's a better way to do it with GenericArray

Serializing to some sort of fixed-sized byte array would probably make the most sense. Note that doesn't necessarily have to be a GenericArray. You could have something like:

pub trait SerializableState {
    type Serialization: AsRef<[u8]> + AsMut<[u8]>;

    fn serialize_state(&mut self) -> &Self::Serialization;
    fn from_serialized_state(&Self::Serialization) -> Self;
}

Then when it comes time to impl the trait, use an array type like type Serialization = [u8; 32];

@tarcieri
Copy link
Member

RustCrypto/traits#1369 has landed which provides some initial serialization support

@burdges
Copy link

burdges commented Dec 16, 2023

It's much worse than leaking unprocessed data. I'd expect serialzied state breaks the primitives in "most" ways.

It's fine if used in non-adversarial ways of course, like if the same signer makes a lot of signatures on the same data, but it's broken if say a serialized state were passed to a remote signer. It needs serious warnings.

impl digest::Update for Vec<u8> should reduce the cases where users request this.

@wiktor-k
Copy link

Just FYI but it seems this PR finishes the "serializable state" issue and with it I was able to finally capture the hasher state and restore it. Code example here: https://github.com/wiktor-k/hasher-test/blob/main/src/main.rs

@tarcieri
Copy link
Member

Going to close this as completed then by way of SerializableState.

Despite earlier comments, I don't think it makes sense to conflate SerializableState with serde.

SerializableState is a huge footgun and for that reason we placed it under hazmat. If someone wants to serialize state with serde, they can use SerializableState to serialize to bytes first, then use serde to serialize the bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants