Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance-analysis: case-insensitive parser #257

Closed
riederm opened this issue Aug 22, 2021 · 3 comments
Closed

performance-analysis: case-insensitive parser #257

riederm opened this issue Aug 22, 2021 · 3 comments
Labels

Comments

@riederm
Copy link
Collaborator

riederm commented Aug 22, 2021

measure the performance-penalty of the case-insensitive parser introduced in #255 using huge files.

@riederm riederm added the test label Aug 22, 2021
@volsa
Copy link
Member

volsa commented Jul 11, 2023

(Inspired by https://alic.dev/blog/fast-lexing) I did some quick and dirty benchmarking to see if there are any performance hits using a case-insensitive lexer and as far as I can tell there aren't any (at least none you'd feel). The corpus for the benchmark consists of all ST files in our repository, saved in one file via find . -type f -name '*.st' -exec cat {} + > out.txt. That adds up to almost 15k ~7k lines, which when running the lexer in release mode like so

let now = Instant::now();
while !lexer.is_end_of_stream() {
    lexer.advance();
}
let elapsed = now.elapsed().as_micros();

returns the following

  • Insensitive
    Took 722us (End; SourceRange { range: 276576..276588 }; 0)
    Took 681us (End; SourceRange { range: 276576..276588 }; 0)
    Took 660us (End; SourceRange { range: 276576..276588 }; 0)
    Took 629us (End; SourceRange { range: 276576..276588 }; 0)
    Took 762us (End; SourceRange { range: 276576..276588 }; 0)
    Took 656us (End; SourceRange { range: 276576..276588 }; 0)
    Took 650us (End; SourceRange { range: 276576..276588 }; 0)
    Took 646us (End; SourceRange { range: 276576..276588 }; 0)
    Took 632us (End; SourceRange { range: 276576..276588 }; 0)
    Took 644us (End; SourceRange { range: 276576..276588 }; 0)
    Average: 668us
    
  • Sensitive (i.e. remove every mention of ignore(case) in tokens.rs)
    Took 633us (End; SourceRange { range: 276576..276588 }; 0)
    Took 593us (End; SourceRange { range: 276576..276588 }; 0)
    Took 566us (End; SourceRange { range: 276576..276588 }; 0)
    Took 558us (End; SourceRange { range: 276576..276588 }; 0)
    Took 732us (End; SourceRange { range: 276576..276588 }; 0)
    Took 577us (End; SourceRange { range: 276576..276588 }; 0)
    Took 541us (End; SourceRange { range: 276576..276588 }; 0)
    Took 573us (End; SourceRange { range: 276576..276588 }; 0)
    Took 602us (End; SourceRange { range: 276576..276588 }; 0)
    Took 624us (End; SourceRange { range: 276576..276588 }; 0)
    Average: 599us
    

Maybe I'm missing something as these numbers are kinda surprising to me to be honest; anyways here's the code master...__bench_lexer

Follow-up: Is there any interest in creating a metric for the lexer in case we implement a hand-written lexer at some point in the future for whatever reason? Also this section seems obsolete https://plc-lang.github.io/rusty/arch/parser.html#discussion-rusty-lexer?

cc @ghaith @riederm

@riederm
Copy link
Collaborator Author

riederm commented Jul 12, 2023

I agree that the time spent in the lexer seems insignificant, so lets not spend too much effort here.
Lets remove the discussion - https://plc-lang.github.io/rusty/arch/parser.html#discussion-rusty-lexer

@volsa
Copy link
Member

volsa commented Jul 12, 2023

Great, closing this issue then.
Removing the discussion is tracked in PR #902

@volsa volsa closed this as completed Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants