-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance optimizations (up to 3518% faster language detection) #177
Conversation
@@ -474,15 +475,15 @@ impl LanguageDetector { | |||
let char_str = character.encode_utf8(&mut buffer); | |||
|
|||
for (alphabet, language) in self.one_language_alphabets.iter() { | |||
if alphabet.matches(char_str) { | |||
if alphabet.matches_char(character) { | |||
self.increment_counter(&mut word_language_counts, language.clone()); | |||
is_match = true; | |||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not your code, but seems a bit biased on the list order to break here?
Hi @koute, thank you very much for these optimizations. I've been lagging behind to perform updates on the Rust version of Lingua due to a lack of spare time. The Go and Python versions already include further optimizations that are still missing in the Rust version. After I have performed some long-planned performance optimizations myself, I will carefully evaluate your changes and gladly merge them if they hold their promises. |
Somewhat offtopic, but have you considered just using the Rust version as the Python version? With pyo3 it's fairly trivial to expose a Rust library as a Python library; then you wouldn't have to maintain the same thing twice, and a well optimized Rust code is always going to be faster than equivalent Python code. |
I was curious so I compared the performance to the Python version (I quickly made a Python extension with Rust (before this PR): 0.50MB/s (In each case all languages were enabled. Python had all of the models preloaded while Rust didn't.) The performance gain here was not as great as this was a simple singlethreaded benchmark. For my usecase I'm running 64 parallel threads which is where this PR shines. The current Rust version of Lingua has a really huge amount of lock contention, which only shows up when you're doing work on many threads, which is why this simple benchmark doesn't show the 3518% performance increase that I've seen in my program. |
Commits looks similar to other two pull requests
I didn't get as much improvement, but I don't have machine capable of running 64 parallel threads. Another optimization that worked for me is using |
Yeah, sorry, I have only noticed those PRs after I've made all of my optimizations. :P This PR should essentially be a superset of those two.
Hm, that probably shouldn't be necessary now as I've made it use an |
No problem. If this PR gets merged it would be very helpful for me. I think you can still try |
Yes, I have considered it but there are still issues with converting Rust enums to Python enums that I want to be fixed before I do another attempt. See #154, for instance. |
Yeah, during construction it could potentially speed things up. For now I only touched detection. There are still other performance optimizations that one could do, but I didn't want to rewrite the whole crate. (: I might consider doing some more work if/when this PR gets merged. (e.g. model loading is something that's also ripe for improvement and I could easily make it significantly more efficient)
Hmm.... this might be a silly question, so please forgive me if I'm missing something, but in this case have you considered having a thin Python shim around it? (: If you can't export a nice API from Rust directly then you could just write a thin Python wrapper that would use the clumsy |
A silly question? Not at all. :) No, I have not thought about it yet because, on the one hand, I wanted to practice my Python skills and, on the other hand, some people were happy to find a pure Python implementation that makes compilation or deployment easier for them. There are still some Python environments out there (Android and IOS come to my mind) that accept only pure Python packages. |
@koute I have applied most of your improvements now except for removing the So thank you very much! (-: |
That's actually the most important optimization of all of them! (: You didn't detect any changes because you were most likely testing only with one thread. It's an easy trap to fall into. Here's a benchmark that will show you the difference:
use rayon::prelude::*;
fn main() {
let data = include_str!("dracula.txt");
let det = lingua::LanguageDetectorBuilder::from_all_languages().build();
let xs: Vec<_> = std::iter::repeat(data).take(100000).collect();
xs.into_par_iter().for_each(|s| {
let len = s.chars().map(|ch| ch.len_utf8()).take(2048).sum::<usize>();
std::hint::black_box(det.detect_language_of(&s[..len]));
});
}
As you can see that commit saves over an hour and half of CPU time on this benchmark. |
@koute I've just done my own benchmark on the alphabet matching and I'm surprised that it's indeed significantly faster in multithreaded code. So I've decided to include your improvements. The regex crate performance docs mention the following: Synchronization implies some amount of overhead. When a I did not imagine that this would have such a significant impact on performance. Learning never stops. So again, thank you very much. |
I'm using Lingua for one of my projects and I've noticed that it was, well, extremely slow. After profiling it looked like my program was essentially spending 99% of its time in Lingua.
So here's a branch that I had lying around which speeds things up a little. With this branch my program's throughput increases from ~410 documents/s to ~14427 documents/s (a ~3518% increase in throughput!).
Changes made:
regex
crate anymore; instead it just uses aHashSet
of characters for that. (Thescript.rs
used for this was directly copied from theregex_syntax
crate, so it should behave exactly the same, just faster.)ahash
, which is faster that the default one in the standard library.