-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1) Optimize 50k & 100k splitter regular expressions - 10.5s to 8.9s #75
Conversation
lib/src/test/java/com/knuddels/jtokkit/reference/Cl100kBaseTest.java
Outdated
Show resolved
Hide resolved
lib/src/test/java/com/knuddels/jtokkit/reference/Cl100kBaseTest.java
Outdated
Show resolved
Hide resolved
lib/src/test/java/com/knuddels/jtokkit/reference/Cl100kBaseTest.java
Outdated
Show resolved
Hide resolved
… source codes and extreme DOS content
a141fc2
to
e478b49
Compare
Benchmark (dataFolderPath) Mode Cnt Score Error Units SingleThreadedBenchmark.benchmarkCl100kBase data ss 10 10.548 ± 0.885 s/op SingleThreadedBenchmark.benchmarkP50kBase data ss 10 9.999 ± 0.097 s/op SingleThreadedBenchmark.benchmarkP50kEdit data ss 10 10.184 ± 0.131 s/op SingleThreadedBenchmark.benchmarkR50kBase data ss 10 9.938 ± 0.076 s/op
Before: Benchmark (dataFolderPath) Mode Cnt Score Error Units SingleThreadedBenchmark.benchmarkCl100kBase data ss 10 10.548 ± 0.885 s/op SingleThreadedBenchmark.benchmarkP50kBase data ss 10 9.999 ± 0.097 s/op SingleThreadedBenchmark.benchmarkP50kEdit data ss 10 10.184 ± 0.131 s/op SingleThreadedBenchmark.benchmarkR50kBase data ss 10 9.938 ± 0.076 s/op After: Benchmark (dataFolderPath) Mode Cnt Score Error Units SingleThreadedBenchmark.benchmarkCl100kBase data ss 10 8.947 ± 0.109 s/op SingleThreadedBenchmark.benchmarkP50kBase data ss 10 9.419 ± 0.082 s/op SingleThreadedBenchmark.benchmarkP50kEdit data ss 10 9.365 ± 0.073 s/op SingleThreadedBenchmark.benchmarkR50kBase data ss 10 8.403 ± 0.080 s/op
e478b49
to
9f399f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great PR, the structure and the helpful comments!
I have commented some questions that I would love to get resolved before merging, but overall they are quite minor 🙂
benchmark/src/jmh/java/com/knuddels/jtokkit/BenchmarkUtils.java
Outdated
Show resolved
Hide resolved
lib/src/test/java/com/knuddels/jtokkit/reference/Cl100kBaseTest.java
Outdated
Show resolved
Hide resolved
lib/src/test/java/com/knuddels/jtokkit/reference/Cl100kBaseTest.java
Outdated
Show resolved
Hide resolved
83715cb
to
47e0df9
Compare
JUnit 5 doesn't require public modifiers anymore. Private modifiers also don't make sense for non-helper tests since they aren't inherited. Renamed Cl100kBaseTestTest to Cl100kBaseTest. Removed dangling final modifiers. Used `var` in all local declarations.
47e0df9
to
c0c3520
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated in the in-line comments: I can easily add testing with all LTS JDKs after your changes are merged. Let's not let that delay us moving forward with the other PRs 🙂
As the first step in optimizing the c100k parser mostly (used for GPT 3.5 & 4), here's the regex optimization applying to all 50k and the 100k parsers.
The difference is not huge, but measurable:
Before:
After:
Please review commit-by-commit for the changes to make sense:
![image](https://private-user-images.githubusercontent.com/1841944/292475324-396d8649-8289-4b1b-a1fa-6e55631a474f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk4NTQ4NjEsIm5iZiI6MTczOTg1NDU2MSwicGF0aCI6Ii8xODQxOTQ0LzI5MjQ3NTMyNC0zOTZkODY0OS04Mjg5LTRiMWItYTFmYS02ZTU1NjMxYTQ3NGYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMThUMDQ1NjAxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NDY4YmQ1ZTQ2OGUwNGI3ZGIwODQxZTRmNGU2MzUzNjUxNjliYjMxMjAwZmZiNDI5ZWQ2NTI2ODQ1YTdkOGM1OCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.CV1R_L3ds2PRY2pj1xY1VNlynlALazX0QvQrJxMse-c)
Feel free to either comment - or if it's simpler -, add commits on top of these.