-
-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid unicode code point with flatten #1742
Comments
#[derive(Deserialize, Serialize)]
struct B {
c: String,
} |
I’m aware of this. The problem I’m trying to solve is that there is some legacy JSON I need to parse which can contain strings with URLs with attacks of this nature: The advice I saw elsewhere is to use serde_bytes in conjunction with a u8-oriented type. After this I can sanitize the data into proper UTF-8. The concern/issue I have though is that I think if there’s ever a case where deserialize_any gets implicitly called it will serialized as a string rather than bytes, and consequently not return any of the JSON at all, losing what is effectively a valuable message. I could preprocess the text before serde_json, my concern with doing that is that I only want to sanitize data within strings, outside of those it should be a legitimate parse error. This would imply writing a limited JSON parser myself, which seems a bit excessive. |
I think a new version just landed, because I came back to this ticket and I got an error about flatten only working with structs or maps that I don't remember. If I change it to something like this:
then I don't see the error anymore. So, I don't think it's flatten at least. |
Here we go. This doesn't work. In struct B, 'c' works fine. In struct A, 'c' fails.
|
This is a duplicate of #1183. Allowing escape sequences that don't produce UTF-8 is a serde_json-specific feature that kicks in when running &[u8]'s Deserialize impl, and doesn't kick in when running the Deserialize impl that |
Thanks |
It seems that if an unconsumed field's value has an invalid code point and flatten is specified it will trigger "invalid unicode code point".
I think - though I'm not sure - that it may be possible to get rid of the error for known fields by using serde_bytes. I'm still checking through the code I have. However, this leaves open the door to a new field adding invalid code points that can't be protected against by setting the field to use serde_bytes (because of course it isn't known).
Example:
Thanks.
The text was updated successfully, but these errors were encountered: