Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic deserialization proposal #861

Closed
wants to merge 6 commits into from
Closed

Conversation

addisoncrump
Copy link
Collaborator

Using downcast-rs, inventory, and named serialization.

Comment on lines +142 to +148
fn deserialize_generalised_to_bytes() {
let generalised = GeneralizedInput::new(b"hello".to_vec());
let mut buf = Vec::new();
generalised.serialize_dynamic(&mut buf).unwrap();
let bytes = BytesInput::deserialize_dynamic(&buf).unwrap().unwrap();
assert_eq!(bytes.target_bytes().as_slice(), b"hello");
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example usage of the dynamic deserialisation. #858

Comment on lines +151 to +159
fn failed_deserialize_from_nop() {
// note that NopInput implements HasTargetBytes, but because we have not submitted the
// conversion BytesInput cannot be converted from NopInput

let nop = NopInput {};
let mut buf = Vec::new();
nop.serialize_dynamic(&mut buf).unwrap();
assert!(BytesInput::deserialize_dynamic(&buf).unwrap().is_none());
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments here: if the deserialisation for this type cannot occur, it is skipped and not deserialised.

@@ -24,6 +27,8 @@ pub struct BytesInput {
}

impl Input for BytesInput {
const NAME: &'static str = "BytesInput";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here type_name can be used instead of an associated const https://doc.rust-lang.org/std/any/fn.type_name.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, awesome -- didn't know about that. I'll patch it out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this needs to be an associated const because type_name is not const stable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be stabilized soon, see rust-lang/rust#63084

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So keep the const for now but put a comment to remind us to switch to type_name once it is stable

Copy link
Member

@domenukk domenukk Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact contents and format of the string returned are not specified [...] amongst the strings that type_name::<Option<String>>() might return are "Option<String>" and "std::option::Option<std::string::String>".

Could be an issue, probably a NAME field is safer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maye it should be called TYPE else it's confusing with our named trait(?)

/// Generate a name for this input
fn generate_name(&self, idx: usize) -> String;

/// An hook executed if the input is stored as `Testcase`
fn wrapped_as_testcase(&mut self) {}
}

/// Utility trait for downcasting inputs for conversion
#[cfg(feature = "std")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the libs seem to support no_std

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still inventory doesn't so this stuff is anyway std-only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it uses ctor I guess. Nevermind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in theory yes, it will work for an ELF with no_std for instance but not for no_std in general. Maybe we should use the ctor feature here to enable it in specific no_std cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you can enable it on no_std with the input_conversion feature, on by default with std

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also our registry will check now for ctor and not std, so we can enable ctor on no_std when supported

@andreafioraldi
Copy link
Member

Here the issue is for some types serialization and deserialization to other types can be complex. For instance, NautilusInput to serialize to bytes (unparse method) needs a reference to NautilusContext.

@andreafioraldi
Copy link
Member

Btw downcast-rs is interesting, I was not aware of it, but it seems to bound the type to Any https://github.com/marcianx/downcast-rs/blob/master/src/lib.rs#L175 and so I think it cannot work with non 'static types (?), we should try with a minimal example, atm it may just work because all the inputs we have are static too

@addisoncrump
Copy link
Collaborator Author

Btw downcast-rs is interesting, I was not aware of it, but it seems to bound the type to Any https://github.com/marcianx/downcast-rs/blob/master/src/lib.rs#L175 and so I think it cannot work with non 'static types (?), we should try with a minimal example, atm it may just work because all the inputs we have are static too

My example above shows that the requirement for 'static is avoided.

@addisoncrump
Copy link
Collaborator Author

Here the issue is for some types serialization and deserialization to other types can be complex. For instance, NautilusInput to serialize to bytes (unparse method) needs a reference to NautilusContext.

This can be specialised by the implementation, if you're comfortable with passing around a NautilusContext 😁

@domenukk
Copy link
Member

Like an AnyMap for state?

Comment on lines +91 to +102
fn serialize_dynamic(&self, buf: &mut Vec<u8>) -> Result<(), postcard::Error> {
buf.extend_from_slice(postcard::to_allocvec(Self::NAME)?.as_slice());
buf.extend_from_slice(postcard::to_allocvec(self)?.as_slice());
Ok(())
}

/// Deserializes this input type from the dynamic serialization format, if possible
fn deserialize_dynamic(
buf: &[u8],
) -> Result<Option<Self>, <&mut Deserializer<Slice> as serde::de::Deserializer>::Error> {
convert_named(buf)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods can be overriden, so it's possible to serialize context and pass it at the end of buf here; just append the context to the end of the message. If the target already has it, it can opt to simply not deserialize it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how big context is, but if it's reasonably sized this could be effective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context doesn't need to be serialized, but it is needed to do the serialization as it contains some info to convert the AST to bytes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that work with LLMP now? I don't see its unparse getting invoked except in specific places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't. To serialize with serde there is no need of it, while it is needed to convert the AST stored in the input to bytes. You see it only in the harness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also EncodedInput has something similar, as it is an array of u32 id representing tokens and the mapping id -> token is ofc not stored in the input itself so to convert to bytes you need to call https://github.com/AFLplusplus/LibAFL/blob/main/libafl/src/inputs/encoded.rs#L75

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the conversion would have to happen at the sending client, not the receiving?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I see why it is convenient to do in the receiver, we can reuse the NewTestcase event and avoid double memory usage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to think about it, but I don't see other solutions, the tokens map in the EncodedInput decoder can be large even gigabytes when using a huge initial corpus

@addisoncrump
Copy link
Collaborator Author

So what's the verdict? Are we sticking with this approach or will we use something else? (e.g., making the clients aware of local transformations they need to complete first?)

@andreafioraldi
Copy link
Member

I'm thinking about an additional field to NewTestcase

something like

NewTestcase {
     input_type_name: String,
     serialized_input: Option<Vec<u8>>,
     bytes_input: Option<Vec<u8>>,
     ...
}

Then the event manager can take two optional parsing and unparsing lambdas to convert an input to bytes and back.
When the event is received, we compare the type name and if it match serialized_input it is deserialized, like we already do.
If not, and if bytes_input is not None, we invoke the parsing lambda if present.

@domenukk
Copy link
Member

Any decisions @andreafioraldi ?

@domenukk
Copy link
Member

Do we have a verdict here? @andreafioraldi ?

@addisoncrump
Copy link
Collaborator Author

Closing in favour of #997

@domenukk domenukk deleted the dynamic_deserialisation branch March 18, 2024 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants