-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster deserialization #246
Conversation
This avoids expensive calls to SerializationContext.Default when we are going to ignore the result.
current = stack.Pop(); | ||
if (current == null) | ||
{ | ||
yield return finished; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be misreading this code, but calling stack.Pop()
on an empty stack throws an InvalidOperationException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an extra null
reference lying around. current
is initialized to null
when we're outside of any components, so the first stack.Push(current);
pushes the null
onto the stack.
|
||
private static IEnumerable<string> GetContentLines(TextReader reader) | ||
{ | ||
var currentLine = new StringBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we've disagreed on this in the past, but since this PR is about optimization... default capacity of StringBuilder
is 16. The spec allows for line widths of 75 octets. Attachments are often many lines long, and normal calendar components are often more than 16 characters long as well, with, e.g. "BEGIN:" being 6 characters right off the bat. In order to avoid the repeated alloc -> array copy -> GC overhead that'll be incurred the majority of the time (StringBuilder grows by a factor of 2: 16, 32, 64, 128 -- I tested), might be better to initialize the StringBuilder with a capacity of 75?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can help enough to be worth doing. The profiler trace shows <GetContentLines>d__15.MoveNext()
taking 5.17% of the time, with the ReadLine()
call taking 2.45%, so even if we got the StringBuilder
calls down to zero, we'd only see a 2.72% improvement.
Also note that StringBuilder
will account for the length of the string being appended, and we're appending full lines here, so it will jump right to a capacity of 75 on the first folded line.
I did try running a test where I used the original string for lines that weren't folded, and only used the StringBuilder
when we needed to unfold. There was no noticeable improvement, so I didn't keep it. Since the first string passed to the StringBuilder
would have had a length of 75, that would have done this optimization in addition to removing copies for short lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suspect my obsession with truly micro micro-optimizations only matters in a context like high-frequency trading where nanoseconds matter. Or if you're writing Kestrel or RavenDb.
public IEnumerable<ICalendarComponent> Deserialize(TextReader reader) | ||
{ | ||
var context = new SerializationContext(); | ||
var stack = new Stack<ICalendarComponent>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar disclaimer as my comment on StringBuilder... I suspect stacks grow the same way List
s do: 0, 4, 8, 16, etc. I think you're always going to have at least once round of allocation. Although, hmm... maybe the first "expansion" is free, because there's only a single allocation (which is no different(?) than starting with 4
) and no additional copy -> GC overhead. And 4 is probably a reasonable stack size to start with in this case anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the deepest that this stack ever gets is [null, VCALENDAR, VEVENT]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had written a comment:
I think the simple deserializer tests are a CnP of the other Serialization class? They looked familiar. If so, those deserialization tests mostly test happy paths and edge cases. Do you mind adding a few tests for cases where you throw exceptions? (I've been trying to add these as I fix bugs/write new code.) Nonexistent/unmatched end tokens, garbage input, that sort of thing.
But now I wonder if I'll get that for "free" by replacing the default serialization implementation with this one. At the same time, I'm not sure I should do that, either. One of the things that I wanted for v3/net-core was for calendar components like CalendarEvent
to take a string constructor argument, so you could have an event without a parent. Because sometimes you want to do those kinds of things.
Maybe that's a non-sequitur. Maybe it isn't. String constructor arguments imply stronger, component-level validation... you could end up with a really weird graph with this serializer implementation.
I lean towards merging, but leaving it opt-in. (I think that was your intent anyway.)
I wrote a small deserializer that is much simpler than the current ANTLR one. On my machine, it is able to deserialize about twice as quickly as the current one.
I also changed the
ISerializerFactory
implementations to pass theSerializationContext
as a constructor parameter when constructing serializers. This avoids an expensive call toSerializationContext.Default
that created an object that was then discarded.