-
-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic deserialization proposal #861
Conversation
fn deserialize_generalised_to_bytes() { | ||
let generalised = GeneralizedInput::new(b"hello".to_vec()); | ||
let mut buf = Vec::new(); | ||
generalised.serialize_dynamic(&mut buf).unwrap(); | ||
let bytes = BytesInput::deserialize_dynamic(&buf).unwrap().unwrap(); | ||
assert_eq!(bytes.target_bytes().as_slice(), b"hello"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example usage of the dynamic deserialisation. #858
fn failed_deserialize_from_nop() { | ||
// note that NopInput implements HasTargetBytes, but because we have not submitted the | ||
// conversion BytesInput cannot be converted from NopInput | ||
|
||
let nop = NopInput {}; | ||
let mut buf = Vec::new(); | ||
nop.serialize_dynamic(&mut buf).unwrap(); | ||
assert!(BytesInput::deserialize_dynamic(&buf).unwrap().is_none()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments here: if the deserialisation for this type cannot occur, it is skipped and not deserialised.
@@ -24,6 +27,8 @@ pub struct BytesInput { | |||
} | |||
|
|||
impl Input for BytesInput { | |||
const NAME: &'static str = "BytesInput"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here type_name can be used instead of an associated const https://doc.rust-lang.org/std/any/fn.type_name.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, awesome -- didn't know about that. I'll patch it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this needs to be an associated const because type_name is not const stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be stabilized soon, see rust-lang/rust#63084
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So keep the const for now but put a comment to remind us to switch to type_name once it is stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact contents and format of the string returned are not specified [...] amongst the strings that
type_name::<Option<String>>()
might return are"Option<String>"
and"std::option::Option<std::string::String>"
.
Could be an issue, probably a NAME field is safer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maye it should be called TYPE
else it's confusing with our named
trait(?)
libafl/src/inputs/mod.rs
Outdated
/// Generate a name for this input | ||
fn generate_name(&self, idx: usize) -> String; | ||
|
||
/// An hook executed if the input is stored as `Testcase` | ||
fn wrapped_as_testcase(&mut self) {} | ||
} | ||
|
||
/// Utility trait for downcasting inputs for conversion | ||
#[cfg(feature = "std")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the libs seem to support no_std
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still inventory doesn't so this stuff is anyway std-only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it uses ctor
I guess. Nevermind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in theory yes, it will work for an ELF with no_std for instance but not for no_std in general. Maybe we should use the ctor feature here to enable it in specific no_std cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you can enable it on no_std with the input_conversion feature, on by default with std
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also our registry will check now for ctor and not std, so we can enable ctor on no_std when supported
Here the issue is for some types serialization and deserialization to other types can be complex. For instance, NautilusInput to serialize to bytes (unparse method) needs a reference to NautilusContext. |
Btw downcast-rs is interesting, I was not aware of it, but it seems to bound the type to Any https://github.com/marcianx/downcast-rs/blob/master/src/lib.rs#L175 and so I think it cannot work with non 'static types (?), we should try with a minimal example, atm it may just work because all the inputs we have are static too |
My example above shows that the requirement for 'static is avoided. |
This can be specialised by the implementation, if you're comfortable with passing around a NautilusContext 😁 |
Like an AnyMap for state? |
fn serialize_dynamic(&self, buf: &mut Vec<u8>) -> Result<(), postcard::Error> { | ||
buf.extend_from_slice(postcard::to_allocvec(Self::NAME)?.as_slice()); | ||
buf.extend_from_slice(postcard::to_allocvec(self)?.as_slice()); | ||
Ok(()) | ||
} | ||
|
||
/// Deserializes this input type from the dynamic serialization format, if possible | ||
fn deserialize_dynamic( | ||
buf: &[u8], | ||
) -> Result<Option<Self>, <&mut Deserializer<Slice> as serde::de::Deserializer>::Error> { | ||
convert_named(buf) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two methods can be overriden, so it's possible to serialize context and pass it at the end of buf here; just append the context to the end of the message. If the target already has it, it can opt to simply not deserialize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how big context is, but if it's reasonably sized this could be effective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context doesn't need to be serialized, but it is needed to do the serialization as it contains some info to convert the AST to bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that work with LLMP now? I don't see its unparse getting invoked except in specific places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't. To serialize with serde there is no need of it, while it is needed to convert the AST stored in the input to bytes. You see it only in the harness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also EncodedInput has something similar, as it is an array of u32 id representing tokens and the mapping id -> token is ofc not stored in the input itself so to convert to bytes you need to call https://github.com/AFLplusplus/LibAFL/blob/main/libafl/src/inputs/encoded.rs#L75
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the conversion would have to happen at the sending client, not the receiving?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I see why it is convenient to do in the receiver, we can reuse the NewTestcase event and avoid double memory usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to think about it, but I don't see other solutions, the tokens map in the EncodedInput decoder can be large even gigabytes when using a huge initial corpus
So what's the verdict? Are we sticking with this approach or will we use something else? (e.g., making the clients aware of local transformations they need to complete first?) |
I'm thinking about an additional field to NewTestcase something like
Then the event manager can take two optional parsing and unparsing lambdas to convert an input to bytes and back. |
Any decisions @andreafioraldi ? |
Do we have a verdict here? @andreafioraldi ? |
Closing in favour of #997 |
Using downcast-rs, inventory, and named serialization.