Performance optimizations (up to 3518% faster language detection) #177

koute · 2023-05-08T07:23:34Z

I'm using Lingua for one of my projects and I've noticed that it was, well, extremely slow. After profiling it looked like my program was essentially spending 99% of its time in Lingua.

So here's a branch that I had lying around which speeds things up a little. With this branch my program's throughput increases from ~410 documents/s to ~14427 documents/s (a ~3518% increase in throughput!).

Changes made:

The models are now loaded once before iterating over the ngrams, instead of trying to load them for each and every processed ngram.
The ngrams are now constructed from words without any extra allocations.
Matching the alphabet doesn't use the regex crate anymore; instead it just uses a HashSet of characters for that. (The script.rs used for this was directly copied from the regex_syntax crate, so it should behave exactly the same, just faster.)
In places where performance matters I've switched the hash map hasher to ahash, which is faster that the default one in the standard library.

gilescope · 2023-05-08T07:46:57Z

src/detector.rs

@@ -474,15 +475,15 @@ impl LanguageDetector {
 let char_str = character.encode_utf8(&mut buffer);

 for (alphabet, language) in self.one_language_alphabets.iter() {
- if alphabet.matches(char_str) {
+ if alphabet.matches_char(character) {
 self.increment_counter(&mut word_language_counts, language.clone());
 is_match = true;
 break;


Not your code, but seems a bit biased on the list order to break here?

pemistahl · 2023-05-08T08:38:27Z

Hi @koute, thank you very much for these optimizations. I've been lagging behind to perform updates on the Rust version of Lingua due to a lack of spare time. The Go and Python versions already include further optimizations that are still missing in the Rust version.

After I have performed some long-planned performance optimizations myself, I will carefully evaluate your changes and gladly merge them if they hold their promises.

koute · 2023-05-08T08:48:35Z

The Go and Python versions already include further optimizations that are still missing in the Rust version.

Somewhat offtopic, but have you considered just using the Rust version as the Python version? With pyo3 it's fairly trivial to expose a Rust library as a Python library; then you wouldn't have to maintain the same thing twice, and a well optimized Rust code is always going to be faster than equivalent Python code.

koute · 2023-05-08T09:36:06Z

I was curious so I compared the performance to the Python version (I quickly made a Python extension with pyo3 to use the Rust Lingua from Python and benchmarked them using exactly the same Python script); here are the results:

Rust (before this PR): 0.50MB/s
Rust (this PR): 0.70MB/s
Python: 0.03MB/s

(In each case all languages were enabled. Python had all of the models preloaded while Rust didn't.)

The performance gain here was not as great as this was a simple singlethreaded benchmark. For my usecase I'm running 64 parallel threads which is where this PR shines. The current Rust version of Lingua has a really huge amount of lock contention, which only shows up when you're doing work on many threads, which is why this simple benchmark doesn't show the 3518% performance increase that I've seen in my program.

serega · 2023-05-08T09:51:06Z

Commits looks similar to other two pull requests

I didn't get as much improvement, but I don't have machine capable of running 64 parallel threads.

Another optimization that worked for me is using SmolStr for Ngram
serega@9eaa49f

koute · 2023-05-08T09:54:54Z

Commits looks similar to other two pull requests

Load models slightly more eagerly and reuse for all ngrams during detection. #82

NgramRef is used instead of Ngram during the detection phase #148

I didn't get as much improvement, but I don't have machine capable of running 64 parallel threads.

Yeah, sorry, I have only noticed those PRs after I've made all of my optimizations. :P This PR should essentially be a superset of those two.

Another optimization that worked for me is using SmolStr for Ngram serega@9eaa49f

Hm, that probably shouldn't be necessary now as I've made it use an &str, which is almost certainly going to be faster (as it's zero-copy).

serega · 2023-05-08T10:02:49Z

No problem. If this PR gets merged it would be very helpful for me. I think you can still try SmolStr for storing Ngram in memory during detector construction, and NgramRef during detection. Because ngrams are small strings, we can store them inline without additional allocation per string. With SmolStr the detection uses less memory and works faster because it avoids additional indirection.

pemistahl · 2023-05-08T10:44:58Z

Somewhat offtopic, but have you considered just using the Rust version as the Python version? With pyo3 it's fairly trivial to expose a Rust library as a Python library; then you wouldn't have to maintain the same thing twice, and a well optimized Rust code is always going to be faster than equivalent Python code.

Yes, I have considered it but there are still issues with converting Rust enums to Python enums that I want to be fixed before I do another attempt. See #154, for instance.

koute · 2023-05-08T10:58:07Z

No problem. If this PR gets merged it would be very helpful for me. I think you can still try SmolStr for storing Ngram in memory during detector construction, and NgramRef during detection. Because ngrams are small strings, we can store them inline without additional allocation per string. With SmolStr the detection uses less memory and works faster because it avoids additional indirection.

Yeah, during construction it could potentially speed things up. For now I only touched detection. There are still other performance optimizations that one could do, but I didn't want to rewrite the whole crate. (: I might consider doing some more work if/when this PR gets merged. (e.g. model loading is something that's also ripe for improvement and I could easily make it significantly more efficient)

Somewhat offtopic, but have you considered just using the Rust version as the Python version? With pyo3 it's fairly trivial to expose a Rust library as a Python library; then you wouldn't have to maintain the same thing twice, and a well optimized Rust code is always going to be faster than equivalent Python code.

Yes, I have considered it but there are still issues with converting Rust enums to Python enums that I want to be fixed before I do another attempt. See #154, for instance.

Hmm.... this might be a silly question, so please forgive me if I'm missing something, but in this case have you considered having a thin Python shim around it? (: If you can't export a nice API from Rust directly then you could just write a thin Python wrapper that would use the clumsy pyo3 API and expose a nice Python API. And once the issues with pyo3 are fixed you could get rid of it. Wouldn't that be significantly less effort than maintaining a whole separate reimplementation? (: (Especially since even as-is without my optimizations the Rust implementation is 10x faster than the Python one.)

pemistahl · 2023-05-08T11:20:33Z

Hmm.... this might be a silly question, so please forgive me if I'm missing something, but in this case have you considered having a thin Python shim around it? (:

A silly question? Not at all. :) No, I have not thought about it yet because, on the one hand, I wanted to practice my Python skills and, on the other hand, some people were happy to find a pure Python implementation that makes compilation or deployment easier for them. There are still some Python environments out there (Android and IOS come to my mind) that accept only pure Python packages.

pemistahl · 2023-05-26T15:09:40Z

@koute I have applied most of your improvements now except for removing the regex crate in favor of a hash set of characters. I could not identify a significant performance improvement for this change. Additionally, I don't want to copy and maintain the script.rs file manually. So I leave that one out. But apart from this, your changes are great.

So thank you very much! (-:

koute · 2023-05-27T03:03:59Z

I have applied most of your improvements now except for removing the regex crate in favor of a hash set of characters. I could not identify a significant performance improvement for this change.

That's actually the most important optimization of all of them! (: You didn't detect any changes because you were most likely testing only with one thread. It's an easy trap to fall into.

Here's a benchmark that will show you the difference:

Set up a reproduction and download some test data (this will use my PR without the commit that replaced regex uses):

$ cargo new --bin repro
$ cd repro
$ cargo add lingua --git https://github.com/koute/lingua-rs.git --rev 92f8d5d1e078cfafcbc60d52b6f6821a65ded422
$ cargo add rayon
$ curl "https://www.gutenberg.org/cache/epub/345/pg345.txt" -o src/dracula.txt

Paste this in src/main.rs:

use rayon::prelude::*;

fn main() {
    let data = include_str!("dracula.txt");
    let det = lingua::LanguageDetectorBuilder::from_all_languages().build();
    let xs: Vec<_> = std::iter::repeat(data).take(100000).collect();
    xs.into_par_iter().for_each(|s| {
        let len = s.chars().map(|ch| ch.len_utf8()).take(2048).sum::<usize>();
        std::hint::black_box(det.detect_language_of(&s[..len]));
    });
}

Run it:

$ cargo build --release && time cargo run --release

real	2m0.792s
user	23m49.315s
sys	96m4.691s

Now replace the Lingua dependency with one where regex is not used:

$ cargo remove lingua
$ cargo add lingua --git https://github.com/koute/lingua-rs.git --rev 304cb5a113ddbb968fab2ef45e41b686e42e5aa8

Rerun it:

$ cargo build --release && time cargo run --release

real	0m5.737s
user	5m23.120s
sys	0m0.247s

As you can see that commit saves over an hour and half of CPU time on this benchmark.

pemistahl · 2023-06-08T21:25:42Z

@koute I've just done my own benchmark on the alphabet matching and I'm surprised that it's indeed significantly faster in multithreaded code. So I've decided to include your improvements. The regex crate performance docs mention the following:

Synchronization implies some amount of overhead. When a Regex is used from a single thread, this overhead is negligible. When a Regex is used from multiple threads simultaneously, it is possible for the overhead of synchronization from contention to impact performance.

I did not imagine that this would have such a significant impact on performance. Learning never stops. So again, thank you very much.

koute added 4 commits May 8, 2023 16:11

Optimize ngram construction for detection

de63539

Optimize model probability lookup

92f8d5d

Optimize alphabet matching

304cb5a

Use ahash in performance-sensitive spots

ee4e6d2

gilescope reviewed May 8, 2023

View reviewed changes

Fix compilation for wasm32-unknown-unknown

b98a944

gilescope mentioned this pull request May 8, 2023

Support exporting Rust enums to Python (full ADTs) PyO3/pyo3#417

Open

pemistahl closed this May 26, 2023

pemistahl added this to the Lingua 1.5.0 milestone Jun 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimizations (up to 3518% faster language detection) #177

Performance optimizations (up to 3518% faster language detection) #177

koute commented May 8, 2023

gilescope May 8, 2023

pemistahl commented May 8, 2023

koute commented May 8, 2023 •

edited

Loading

koute commented May 8, 2023 •

edited

Loading

serega commented May 8, 2023

koute commented May 8, 2023

serega commented May 8, 2023

pemistahl commented May 8, 2023

koute commented May 8, 2023

pemistahl commented May 8, 2023

pemistahl commented May 26, 2023

koute commented May 27, 2023

pemistahl commented Jun 8, 2023

Performance optimizations (up to 3518% faster language detection) #177

Performance optimizations (up to 3518% faster language detection) #177

Conversation

koute commented May 8, 2023

gilescope May 8, 2023

Choose a reason for hiding this comment

pemistahl commented May 8, 2023

koute commented May 8, 2023 • edited Loading

koute commented May 8, 2023 • edited Loading

serega commented May 8, 2023

koute commented May 8, 2023

serega commented May 8, 2023

pemistahl commented May 8, 2023

koute commented May 8, 2023

pemistahl commented May 8, 2023

pemistahl commented May 26, 2023

koute commented May 27, 2023

pemistahl commented Jun 8, 2023

koute commented May 8, 2023 •

edited

Loading

koute commented May 8, 2023 •

edited

Loading