performance-analysis: case-insensitive parser #257

riederm · 2021-08-22T20:42:22Z

measure the performance-penalty of the case-insensitive parser introduced in #255 using huge files.

volsa · 2023-07-11T17:16:03Z

(Inspired by https://alic.dev/blog/fast-lexing) I did some quick and dirty benchmarking to see if there are any performance hits using a case-insensitive lexer and as far as I can tell there aren't any (at least none you'd feel). The corpus for the benchmark consists of all ST files in our repository, saved in one file via find . -type f -name '*.st' -exec cat {} + > out.txt. That adds up to ~~almost 15k~~ ~7k lines, which when running the lexer in release mode like so

let now = Instant::now();
while !lexer.is_end_of_stream() {
    lexer.advance();
}
let elapsed = now.elapsed().as_micros();

returns the following

Insensitive

Took 722us (End; SourceRange { range: 276576..276588 }; 0)
Took 681us (End; SourceRange { range: 276576..276588 }; 0)
Took 660us (End; SourceRange { range: 276576..276588 }; 0)
Took 629us (End; SourceRange { range: 276576..276588 }; 0)
Took 762us (End; SourceRange { range: 276576..276588 }; 0)
Took 656us (End; SourceRange { range: 276576..276588 }; 0)
Took 650us (End; SourceRange { range: 276576..276588 }; 0)
Took 646us (End; SourceRange { range: 276576..276588 }; 0)
Took 632us (End; SourceRange { range: 276576..276588 }; 0)
Took 644us (End; SourceRange { range: 276576..276588 }; 0)
Average: 668us

Sensitive (i.e. remove every mention of ignore(case) in tokens.rs)

Took 633us (End; SourceRange { range: 276576..276588 }; 0)
Took 593us (End; SourceRange { range: 276576..276588 }; 0)
Took 566us (End; SourceRange { range: 276576..276588 }; 0)
Took 558us (End; SourceRange { range: 276576..276588 }; 0)
Took 732us (End; SourceRange { range: 276576..276588 }; 0)
Took 577us (End; SourceRange { range: 276576..276588 }; 0)
Took 541us (End; SourceRange { range: 276576..276588 }; 0)
Took 573us (End; SourceRange { range: 276576..276588 }; 0)
Took 602us (End; SourceRange { range: 276576..276588 }; 0)
Took 624us (End; SourceRange { range: 276576..276588 }; 0)
Average: 599us

Maybe I'm missing something as these numbers are kinda surprising to me to be honest; anyways here's the code master...__bench_lexer

Follow-up: Is there any interest in creating a metric for the lexer in case we implement a hand-written lexer at some point in the future for whatever reason? Also this section seems obsolete https://plc-lang.github.io/rusty/arch/parser.html#discussion-rusty-lexer?

cc @ghaith @riederm

riederm · 2023-07-12T12:09:13Z

I agree that the time spent in the lexer seems insignificant, so lets not spend too much effort here.
Lets remove the discussion - https://plc-lang.github.io/rusty/arch/parser.html#discussion-rusty-lexer

volsa · 2023-07-12T12:41:02Z

Great, closing this issue then.
Removing the discussion is tracked in PR #902

riederm added the test label Aug 22, 2021

volsa mentioned this issue Jul 12, 2023

Track performance of lexer #899

Closed

volsa closed this as completed Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance-analysis: case-insensitive parser #257

performance-analysis: case-insensitive parser #257

riederm commented Aug 22, 2021

volsa commented Jul 11, 2023 •

edited

Loading

riederm commented Jul 12, 2023 •

edited

Loading

volsa commented Jul 12, 2023

performance-analysis: case-insensitive parser #257

performance-analysis: case-insensitive parser #257

Comments

riederm commented Aug 22, 2021

volsa commented Jul 11, 2023 • edited Loading

riederm commented Jul 12, 2023 • edited Loading

volsa commented Jul 12, 2023

volsa commented Jul 11, 2023 •

edited

Loading

riederm commented Jul 12, 2023 •

edited

Loading