Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite IrcParser, pass through raw text to IrcMessage and fix other issues #230

Merged

Conversation

neon-sunset
Copy link

Changes

Major

  • Add allocation-free split helpers
  • Rewrite and simplify IrcParser
  • Avoid re-constructing IrcMessage back into its raw format by the message handler right after it was parsed
  • Add benchmark project

Minor

  • Fix some IDE warnings
  • Specify strict level of language features validation to raise warnings on unsound/unstable use of newer language features on older targets
  • Replace Tuple<T1, T2> (which is a legacy class type, and should not be used) in ThrottlingService used by queue with (T1, T2) aka ValueTuple<T1, T2>

Rationale

I accidentally stumbled upon https://github.com/jprochazk/twitch-irc-benchmarks and was alerted of, as was initially believed, exceedingly poor performance of the Twitch IRC message parser of this library making it a contributor to global warming given the popularity and truly horrifying numbers.

After finishing reference implementation (https://github.com/neon-sunset/feetlicker) to compete in the benchmark, discussing with maintainers details in Discord and benchmarking TwitchLib's IrcParser, it turned out to be a mistake.

TwitchLib's parser implementation was already competitively fast. The cause for such high execution time and memory traffic numbers (completely deadlocking GC, 15GB of memory traffic in scope of parsing 1000 forsen stream messages) was the fact that the benchmark was measuring parse + handle message cost.

Upon further inspection, the following inefficiencies were identified:

  • Instead of including the source raw text on constructing IrcMessage after successfully parsing it, the raw text was getting immediately rebuilt back with GenerateToString() instead (fixed in this PR)
  • Emote sets parsing reallocates source string into segments multiple times. It should be rewritten in IrcParser style to avoid intermediate allocations (follow-up work)
  • MessageEmoteCollection.Add appears to be extremely expensive both in terms of CPU and memory according to profiler. It dominates the execution time and needs further analysis (follow-up work)

Numbers

Setup

BenchmarkDotNet=v0.13.5, OS=macOS 14.0 [Darwin 23.0.0]
Apple M1 Pro, 1 CPU, 8 logical and 8 physical cores
.NET SDK=8.0.100-preview.6.23316.13
  [Host]     : .NET 7.0.1 (7.0.122.56804), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 7.0.1 (7.0.122.56804), Arm64 RyuJIT AdvSIMD

Parse Message

Base:

Method Message Mean Error StdDev Gen0 Gen1 Allocated
Parse :blub(...) list [66] 90.57 ns 0.116 ns 0.108 ns 0.0612 - 384 B
Parse :jtv(...)ers. [100] 130.00 ns 0.304 ns 0.270 ns 0.0713 - 448 B
Parse @bad(...)senE [884] 1,175.50 ns 1.656 ns 1.468 ns 0.5856 0.0019 3680 B
Parse @bad(...)Guys [382] 890.91 ns 8.708 ns 8.145 ns 0.4101 0.0019 2576 B
Parse @msg-(...)live. [74] 124.96 ns 1.119 ns 1.047 ns 0.0918 - 576 B
Parse @msg(...)ces. [110] 138.44 ns 0.275 ns 0.243 ns 0.1032 - 648 B

PR:

Method Message Mean Error StdDev Gen0 Gen1 Allocated
Parse :blub(...) list [66] 76.47 ns 0.191 ns 0.178 ns 0.0587 - 368 B
Parse :jtv(...)ers. [100] 76.21 ns 0.227 ns 0.201 ns 0.0573 - 360 B
Parse @bad(...)senE [884] 856.95 ns 2.589 ns 2.295 ns 0.5836 0.0048 3664 B
Parse @bad(...)Guys [382] 739.43 ns 1.645 ns 1.458 ns 0.4101 0.0010 2576 B
Parse @msg-(...)live. [74] 122.61 ns 0.380 ns 0.355 ns 0.0994 - 624 B
Parse @msg(...)ces. [110] 127.62 ns 0.280 ns 0.262 ns 0.1109 - 696 B

Parse and check failed auth

Note: this specifically tests for a degenerate case of receiving long message, which are commonplace now.
Base:

Method Mean Error StdDev Gen0 Gen1 Allocated
ParseAndCheckFailedAuth 1218 ns 7.13 ns 6.65 ns 0.7534 0.0038 4.63 KB

PR:

Method Mean Error StdDev Gen0 Gen1 Allocated
ParseAndCheckFailedAuth 753.6 ns 4.33 ns 3.84 ns 0.4101 0.0010 2.52 KB

- Add allocation-free split helpers
- Rewrite and simplify IRC message parser
- Fix minor warnings
- Avoid re-constructing parsed IrcMessage back into its raw format by the message handler (work-in-progress)
- Add benchmark project
- Specify strict level of language features validation to raise warnings on unsound/unstable use of newer language features on older targets
@neon-sunset
Copy link
Author

neon-sunset commented Jun 21, 2023

You might want to squash this PR to avoid commit spam :D

Copy link
Member

@Syzuna Syzuna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for all the work

@Syzuna Syzuna merged commit b58356f into TwitchLib:dev Jun 28, 2023
@neon-sunset neon-sunset deleted the ircparser-rewrite-and-other-perf-fixes branch July 2, 2023 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants