Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add dearbitrary functionality to turn an instance into its arbitrary byte sequence #94

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

bitwave
Copy link

@bitwave bitwave commented Dec 25, 2021

This PR solves issue #44 by implementing a dearbitrary function to create a byte buf, which can be used again with arbitrary to recreate the struct. It is not a one to one mapping, as e.g. bools only use the last bit of a byte value.

What works:

  • derive
  • integer/float/char/bools primitive types
  • tuples
  • ()
  • AtomicBool
  • AtomicIsize
  • AtomicUsize
  • Ranges
  • Duration
  • Option
  • Result
  • [T; N]
  • &'a [u8]
  • Vec
  • BTreeMap
  • BTreeSet
  • BinaryHeap
  • HashMap
  • HashSet
  • ... everything which uses an iterator
  • &'a str
  • String
  • Box
  • Box<[A]>
  • ... many other things missing

Many parts are missing, as I do not need them at the moment and have no time to implement them...

Remark: commits are prone to change/squashing/reordering!!

@bitwave bitwave mentioned this pull request Dec 25, 2021
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this proof of concept PR.

My main concerns are

  1. maintenance overhead and

  2. the potential constraints this puts onto Arbitrary implementations where
    writing dearbitrary would be difficult (e.g. writing dearbitrary for wasm-smith would be quite difficult) and/or the method wouldn't ever be
    used.

I think we can address (2) by making this a new trait, something like1:

trait Dearbitrary: Arbitrary {
    fn dearbitrary(&self) -> Vec<u8>;
}

This way, implementations can opt into being reversible, and we don't have to
force reversibility onto users and Arbitrary implementers that aren't using
it.

The maintenance question is a bit more amorphous, and also isn't just up to me
(cc @Manishearth @nagisa). Some things that would help me get a better idea of
the maintenance burden we would be taking on here:

  • A stubbed-out dual of Unstructured with at least some of the "hard" parts
    implemented (e.g. two internal buffers, one for data that Unstructured pulls
    from the front of the arbitrary input sequence, and another for the data it
    pulls from the end of the arbitrary input sequence). Maybe call this
    Serializer? Structured? Whatever bike shed color looks nice.

  • A fuzz target that checks that we can take an arbitary instance of a type,
    call dearbitrary on it, call arbitrary on the resulting bytes, and then
    assert that the initial arbitrary instance and the new arbitrary instance are
    the same. The type should be an enum that contains all the major different
    kinds of types that this crate implements Arbitrary for:

    #[derive(Debug, PartialEq, Arbitrary, Dearbitrary)]
    enum MyEnum<'a> {
        SignedInt(i32),
        UnsignedInt(u64),
        Float(f64),
        Struct(MyStruct),
        Array([MyStruct; 4]),
        Option(Option<MyStruct>),
        Slice(&'a [u8]),
        Vec(Vec<u8>),
        String(String),
        Tuple((MyStruct, MyStruct)),
        TupleVariant(MyStruct, MyStruct),
        StructVariant { x: MyStruct, y: MyStruct },
    }
    
    #[derive(Debug, PartialEq, Arbitrary, Dearbitrary)]
    struct MyStruct<'a> {
        a: Box<MyEnum<'a>>,
        b: Box<MyEnum<'a>>,
    }

    If we can get this fuzz target running without crashes, then I will have a
    lot more confidence that this feature is something we can maintain going
    forward.

Footnotes

  1. See inline comment below about how this signature will probably need to become more complicated.

Comment on lines +800 to +801
v.append(&mut self.to_vec());
v.append(&mut len_bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for nested &[u8]s that are inside other Arbitrary structs, because the length needs to go to the end of the whole arbitrary byte sequence, not just this nested sequence that will then be composed with the larger sequence.

Because of this, I think that the signature fn dearbitrary(&self) -> Vec<u8> won't be sufficient to correctly implement this feature. We'll need basically the inverse of the Unstructured struct that internally maintains cursors and/or multiple buffers to keep track of this stuff and where there is a dual arbitrary-sequence-serializing method for each arbitrary-data-generating method on Unstructured.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:O

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a big problem to solve, perhaps (as you already say) a counterpart to Unstructured is needed that keeps track of lengths of Vecs and defers appending to the final buffer to a later stage.

@fitzgen fitzgen changed the title WIP: Revert mode fixes #44 WIP: Add dearbitrary functionality to turn an instance into its arbitrary byte sequence Jan 4, 2022
@Manishearth
Copy link
Member

From a maintenance pov i strongly agree that this should be a separate trait. I'm also wary but I'm happy to defer to fitzgen's judgement on this and agree with his assessment of the steps that should happen first.

@bitwave
Copy link
Author

bitwave commented Jan 5, 2022

@fitzgen The seperate trait idea is appropiate, as not everything can be "dearbitraryed" e.g. Atomics
In general, this was just a quick n dirty hack for myself, to create seeding input for a fuzzer. Poking bytes via hexedit and checking the result is quite time consuming :D

@bitwave
Copy link
Author

bitwave commented Jan 5, 2022

A fuzz target that checks that we can take an arbitary instance of a type,
call dearbitrary on it, call arbitrary on the resulting bytes, and then
assert that the initial arbitrary instance and the new arbitrary instance are
the same. The type should be an enum that contains all the major different
kinds of types that this crate implements Arbitrary for

Maybe we can use here something like a QuickCheck implementation for Rust

@fitzgen
Copy link
Member

fitzgen commented Jan 5, 2022

Maybe we can use here something like a QuickCheck implementation for Rust

I would prefer to use libfuzzer in this case, something like:

fuzz_target!(|expected: MyEnum| {
    let bytes = expected.dearbitrary();
    let actual = MyEnum::arbitrary_take_rest(Unstructured::new(bytes)).unwrap();
    assert_eq!(expected, actual);
});

Which also makes me realize that we need not just dearbitrary but also dearbitrary_as_rest or something to be the dual of arbitrary_take_rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants