-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add dearbitrary
functionality to turn an instance into its arbitrary byte sequence
#94
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # src/lib.rs
# Conflicts: # src/lib.rs
# Conflicts: # src/lib.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this proof of concept PR.
My main concerns are
-
maintenance overhead and
-
the potential constraints this puts onto
Arbitrary
implementations where
writingdearbitrary
would be difficult (e.g. writingdearbitrary
forwasm-smith
would be quite difficult) and/or the method wouldn't ever be
used.
I think we can address (2) by making this a new trait, something like1:
trait Dearbitrary: Arbitrary {
fn dearbitrary(&self) -> Vec<u8>;
}
This way, implementations can opt into being reversible, and we don't have to
force reversibility onto users and Arbitrary
implementers that aren't using
it.
The maintenance question is a bit more amorphous, and also isn't just up to me
(cc @Manishearth @nagisa). Some things that would help me get a better idea of
the maintenance burden we would be taking on here:
-
A stubbed-out dual of
Unstructured
with at least some of the "hard" parts
implemented (e.g. two internal buffers, one for data thatUnstructured
pulls
from the front of the arbitrary input sequence, and another for the data it
pulls from the end of the arbitrary input sequence). Maybe call this
Serializer
?Structured
? Whatever bike shed color looks nice. -
A fuzz target that checks that we can take an arbitary instance of a type,
calldearbitrary
on it, callarbitrary
on the resulting bytes, and then
assert that the initial arbitrary instance and the new arbitrary instance are
the same. The type should be anenum
that contains all the major different
kinds of types that this crate implementsArbitrary
for:#[derive(Debug, PartialEq, Arbitrary, Dearbitrary)] enum MyEnum<'a> { SignedInt(i32), UnsignedInt(u64), Float(f64), Struct(MyStruct), Array([MyStruct; 4]), Option(Option<MyStruct>), Slice(&'a [u8]), Vec(Vec<u8>), String(String), Tuple((MyStruct, MyStruct)), TupleVariant(MyStruct, MyStruct), StructVariant { x: MyStruct, y: MyStruct }, } #[derive(Debug, PartialEq, Arbitrary, Dearbitrary)] struct MyStruct<'a> { a: Box<MyEnum<'a>>, b: Box<MyEnum<'a>>, }
If we can get this fuzz target running without crashes, then I will have a
lot more confidence that this feature is something we can maintain going
forward.
Footnotes
-
See inline comment below about how this signature will probably need to become more complicated. ↩
v.append(&mut self.to_vec()); | ||
v.append(&mut len_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work for nested &[u8]
s that are inside other Arbitrary
structs, because the length needs to go to the end of the whole arbitrary byte sequence, not just this nested sequence that will then be composed with the larger sequence.
Because of this, I think that the signature fn dearbitrary(&self) -> Vec<u8>
won't be sufficient to correctly implement this feature. We'll need basically the inverse of the Unstructured
struct that internally maintains cursors and/or multiple buffers to keep track of this stuff and where there is a dual arbitrary-sequence-serializing method for each arbitrary-data-generating method on Unstructured
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:O
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a big problem to solve, perhaps (as you already say) a counterpart to Unstructured
is needed that keeps track of lengths of Vec
s and defers appending to the final buffer to a later stage.
dearbitrary
functionality to turn an instance into its arbitrary byte sequence
From a maintenance pov i strongly agree that this should be a separate trait. I'm also wary but I'm happy to defer to fitzgen's judgement on this and agree with his assessment of the steps that should happen first. |
@fitzgen The seperate trait idea is appropiate, as not everything can be "dearbitraryed" e.g. Atomics |
Maybe we can use here something like a QuickCheck implementation for Rust |
I would prefer to use libfuzzer in this case, something like: fuzz_target!(|expected: MyEnum| {
let bytes = expected.dearbitrary();
let actual = MyEnum::arbitrary_take_rest(Unstructured::new(bytes)).unwrap();
assert_eq!(expected, actual);
}); Which also makes me realize that we need not just |
This PR solves issue #44 by implementing a
dearbitrary
function to create a byte buf, which can be used again witharbitrary
to recreate the struct. It is not a one to one mapping, as e.g. bools only use the last bit of a byte value.What works:
()
Many parts are missing, as I do not need them at the moment and have no time to implement them...
Remark: commits are prone to change/squashing/reordering!!