-
-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize internally tagged enums -- do not use internal buffer if tag is the first field #1922
base: master
Are you sure you want to change the base?
Conversation
84c311d
to
94e15ef
Compare
94e15ef
to
5106111
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussion in #1495 focuses on whether this would be worth it for the cost in compile time. The biggest thing wrong with internally tagged enums is not when the deserializer is slow but that they take significantly long to compile. Overall I would rather optimize for lowering their compile time, not on performance or features at the expense of compile time.
Would you be able to provide some measurements showing the impact of this change on the time to compile internally tagged enums?
I cannot agree with such a question. For me, the runtime performance is much more important thing that compile-time performance. Things are written not for the pleasure of developers, but for solving customer problems.
I will try to study how to do performance measurements, but any guidance is welcome |
That sounds odd tbh. Compile times are affecting only developers, while runtime affects every user of the library / application, which is way more impactful. Why choose compile-time over runtime perf here when we don't do that at any other levels of development (e.g. |
I've made some research and there is the results. I've created a library project with 1000 types and I've measured compilation time. I've noticed small increasing of the compilation time, about 0.01 sec per type (or 7-30%). I think it is acceptable worth for the bigger runtime performance. Test code and raw dataCreated library cargo project with following use serde::{Deserialize};
macro_rules! generate {
($(#[$counter:meta])*) => {
$(
const _: () = {
#[$counter]
#[derive(Deserialize)]
#[serde(tag = "tag")]
enum Node {
Unit,
Struct {
name: String,
list: Vec<Node>,
},
// Uncomment for "big enum" tests
/*
Newtype1(std::collections::HashMap<String, String>),
Newtype2(String),
Newtype3(u32),
Newtype4(f32),
Unit1,
Unit2,
Unit3,
Unit4,
Struct1 { f1: String, f2: u32, f3: bool, f4: f64 },
Struct2 { f1: String, f2: u32, f3: bool, f4: f64 },
Struct3 { f1: String, f2: u32, f3: bool, f4: f64 },
Struct4 { f1: String, f2: u32, f3: bool, f4: f64 },// */
}
};
)*
};
}
// Expanded manually for "expand" tests
generate!(
/// ...
/// 1000 lines
/// ...
); Tests run with command cargo +nightly build -Ztimings Test PCOS version: Windows_NT x64 10.0.18363 Summary tableSmall enum (2 variants), 1000 typesDerived both
Derived only
Derived only
Big enum (14 variants), 1000 typesDerived only
Derived only
|
@Mingun What about runtime performance improvements? |
I didn't measure it, maybe I should |
a169bee
to
da641b1
Compare
Would this also fix the incorrect error messages caused by the internal buffer? #1621 |
For the optimized case yes, it should |
da641b1
to
c22590d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn deserialize_unit<V>(self, visitor: V) -> Result<V::Value, Self::Error> | ||
where | ||
V: de::Visitor<'de>, | ||
{ | ||
// Covered by tests/test_enum_internally_tagged.rs | ||
// newtype_unit | ||
visitor.visit_unit() | ||
} | ||
|
||
fn deserialize_unit_struct<V>( | ||
self, | ||
_name: &'static str, | ||
visitor: V, | ||
) -> Result<V::Value, Self::Error> | ||
where | ||
V: de::Visitor<'de>, | ||
{ | ||
// Covered by tests/test_enum_internally_tagged.rs | ||
// newtype_unit_struct | ||
self.deserialize_unit(visitor) | ||
} | ||
|
||
fn deserialize_newtype_struct<V>(self, _name: &str, visitor: V) -> Result<V::Value, Self::Error> | ||
where | ||
V: de::Visitor<'de>, | ||
{ | ||
visitor.visit_newtype_struct(self) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, should we change behavior of SeqAccessDeserializer
and MapAccessDeserializer
or introduce new private deserializers? From one hand those deserializers was created for support of various serde attributes. From the other hand, technically this is breaking change because those types are public.
c22590d
to
8cd44cf
Compare
I realized, that the old |
As I remember, changes in this PR depends on changes in #2445. As I already said, if you wish, it is possible to do not touch @dtolnay, @oli-obk, please give your opinion, should I make these changes? |
Did this happen? |
No. Any suggestions for the benchmark are welcome. |
It's pretty ancient by now, but in the original issue I referenced binast/binjs-ref@22103b9 where I did a manual implementation of this optimisation just for the types we used. For testing, I checked out a commit right before that. binast/binjs-ref@53bd87a The numbers before this PR:
The numbers with this PR applied:
Note that this is far from a pure JSON benchmark - it uses an external Node.js process to parse JS and produce JSON, and only then parses the output using serde-json, but in that context the -25% perf improvement is even more impressive. It should be easy to save the JSON output and do a pure serde-json benchmark instead (in my original commit I suggested that showed 2x improvement, which seems realistic), but perhaps someone has more modern examples. Anything touching JS AST represented as JSON (e.g. Deserialize for ESTree Program from https://swc.rs/) should work. |
where | ||
S: SeqAccess<'de>, | ||
{ | ||
Ok(()) | ||
match tri!(seq.next_element()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaves quite differently from IgnoredAny.visit_map
. I think the behaviour should be consistent, as in, iterate over the entire sequence and ignore its values instead of erroring out on non-empty sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that initially, but that failed other tests and in general not what you want. The unit / unit struct represented in sequence as nothing, so we need to ensure that sequence is empty. This is consistent with normal behavior where struct deserialization from a sequence expects exact number of values, and those fact that flattened unit / unit struct considered as equal to the struct without fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the unit_variant_with_unknown_fields
is a test that failed if consume the whole sequence here.
serde/test_suite/tests/test_enum_internally_tagged.rs
Lines 1447 to 1456 in 3aca38d
// Unknown elements are not allowed in sequences | |
assert_de_tokens_error::<InternallyTagged>( | |
&[ | |
Token::Seq { len: None }, | |
Token::Str("Unit"), // tag | |
Token::I32(0), | |
Token::SeqEnd, | |
], | |
"invalid length 1, expected 0 elements in sequence", | |
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit / unit struct represented in sequence as nothing
Hm but "nothing" should be pretty different conceptually from "ignored any". I'd expect a custom check just for the nothing case, whereas ignored any should be able to consume anything thrown at it silently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code tries to read something, doesn't matter what. We expect an empty sequence, so if it contains some element, we fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I'm sleepy - I thought you're changing how IgnoredAny works everywhere. I've expanded the context of the diff and I see this is a change on this one specific visitor.
Please disregard my original comment 🤦♂️
Although I now wonder if visit_map
should be changed to check length as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default maps in serde allows unknown keys and when unit is flattened, all keys become unknown. But you're right -- in case of #[serde(deny_unknown_fields)]
we should return error if map not empty. That's idea for another PR!
@oli-obk, how your review progressed? |
Please refrain from pinging, it's already one of the only two PRs in my self review/assigned list |
8cd44cf
to
132dc81
Compare
(review this commit with "ignore whitespace changes" option on)
… from sequence failures (2): newtype_unit_struct unit_variant_with_unknown_fields
failures (1): unit_variant_with_unknown_fields Fixed (1): newtype_unit_struct
When intermediate buffer is used, we can just ignore data, because it already was read from the original deserializer to the buffer and check for the emptiness was performed in another place. Now we reading directly from the original deserializer and should ensure empty sequence by self. Fixed (1): unit_variant_with_unknown_fields
…rst field failures (3): newtype_newtype newtype_unit newtype_unit_struct
…ccessDeserializer Fixed (3): newtype_newtype newtype_unit newtype_unit_struct
(review this commit with "ignore whitespace changes" option on)
Deserializer methods are only hints which deserializer is not obliged to follow. Both - TaggedContentVisitor - InternallyTaggedUnitVisitor accepts only visit_map and visit_seq and that is what derived implementation of Deserialize does for structs. Therefore it is fine to call deserialize_map here, as that already did in derived deserialize implementation. Because those structs officially not public, it is used only by derive macro
132dc81
to
92c1d80
Compare
Fixes #1495
Also that change has one positive side-effect: if tag is the first field, negative effects from #1183 is eliminated, because buffering is not used
@RReverser, feel free to try to run your benchmarks against this branch