Load models slightly more eagerly and reuse for all ngrams during detection. #82

serega · 2022-07-04T03:25:26Z

The library puts emphasis on lazy loading in order to be efficient in certain servless environments. My use-case requires maximum performance, and I can sacrify slower startup for better runtime performance. The LanguageDetectorBuilder has a flag with_preloaded_language_models that forces eager loading of models. However, during detection the LanguageDetector loops through every ngram and calls the function load_language_models, that has to take read lock to check whether the model is loaded. Read locks are cheap, but not free. As an experiment I completely eliminated lazy loading and stored models in the LanguageDetector. Single threaded benchmark improved by 1.2x-2x and multi-threaded by 2x-4x depending on the system.

Patch

This patch works with lazy loading design. Instead of lazy-loading model for every ngram, I load models slightly more eagerly in compute_sum_of_ngram_probabilities and in count_unigrams for the specified language, and reuse the models for ngrams in the loop. For this to work I had to use Arc instead of Box in the BoxedLanguageModel

Benchmark

For the benchmark I used the accuracy reports test data. The benchmark code is here. I tested both single-threaded and multi-threaded/parallel mode.

Results

I tested the patch on two machines

Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz on Linux
M1 Max on Mac

The numbers in the columns Before and After are throughput as detections per second.

Single threaded benchmark

cargo run --release --bin bench -- --max-examples 30000

Machine	Before	After	Change
i7-6800K	1162	1294	1.11x
M1 Max	1592	1844	1.15x

Multi threaded benchmark

cargo run --release --bin bench -- --max-examples 50000 --parallel

Machine	Before	After	Change
i7-6800K	4226	8033	1.9x
M1 Max	1383	5858	4.23x

I am really surprised by the numbers on M1 chip. Before, single-threaded benchmark runs faster than multi-threaded, which means RwLocks on M1-Mac are slow.

I ran accuracy reports and checked against the current main branch
diff -r accuracy-reports/lingua/ ../lingua-rs-main/accuracy-reports/lingua/
The command found no differences.

pemistahl · 2022-07-07T09:55:49Z

Hi @serega, thank you for this PR. This looks interesting. I'm quite busy currently but will definitely look into your changes as soon as I find the time.

serega · 2023-01-19T15:39:26Z

Hi @pemistahl . I have two more relatively simple patches for performance improvements that cut processing time by another half. I use these patches in production, but I would like to use your original project. Wondering if I should submit my patches.

pemistahl · 2023-01-20T10:21:28Z

Hi @serega, I'm sorry for not having merged your changes yet. I was busy with other projects in the last months. But I plan to start working on the Rust implementation of Lingua again. I maintain four implementations of Lingua (Rust, Go, Kotlin, Python). The Go and Python ones are the newest as far as new features and improvements are concerned. The goal is to adapt those features and improvements to the Rust and Kotlin implementations as well.

Can you please update this PR with your new patches? I will then do a new review. Thank you.

serega · 2023-01-21T12:55:07Z

Hi @pemistahl. I know about other implementations too. Thanks 👍 . We have a team using the Kotlin version, although I know they are having issues with memory usage.
Having multiple unrelated changes in the same pull request would be too challenging to review I think. #148 pull request contains one of two other patches I use.

pemistahl · 2023-05-24T19:05:06Z

I have now implemented this concept in c0f7e71, so this PR can be closed.

Load models earlier and reuse for all ngrams during detection.

c5f2218

serega mentioned this pull request Jan 21, 2023

NgramRef is used instead of Ngram during the detection phase #148

Closed

pemistahl force-pushed the main branch from 29164b1 to dfbb165 Compare January 30, 2023 12:52

pemistahl force-pushed the main branch from a578d46 to 53ea6b0 Compare April 23, 2023 19:39

serega mentioned this pull request May 8, 2023

Performance optimizations (up to 3518% faster language detection) #177

Closed

pemistahl closed this May 24, 2023

pemistahl added this to the Lingua 1.5.0 milestone Jun 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load models slightly more eagerly and reuse for all ngrams during detection. #82

Load models slightly more eagerly and reuse for all ngrams during detection. #82

serega commented Jul 4, 2022

pemistahl commented Jul 7, 2022

serega commented Jan 19, 2023

pemistahl commented Jan 20, 2023

serega commented Jan 21, 2023

pemistahl commented May 24, 2023

Load models slightly more eagerly and reuse for all ngrams during detection. #82

Load models slightly more eagerly and reuse for all ngrams during detection. #82

Conversation

serega commented Jul 4, 2022

Patch

Benchmark

Results

Single threaded benchmark

Multi threaded benchmark

pemistahl commented Jul 7, 2022

serega commented Jan 19, 2023

pemistahl commented Jan 20, 2023

serega commented Jan 21, 2023

pemistahl commented May 24, 2023