-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
good accuracy but too slow, how to improve Tesseract speed #263
Comments
You can already run 4 parallel instances of Tesseract on your quad core, then it will read 4 images in about the same time. Introducing multi threading would not help to reduce the time needed for an OCR of many images. I am working on a project where OCR with Tesseract would take nearly 7 years on a single core, but luckily I can try to get many computers and use their cores, so the time can be reduced to a few days. |
Besides the OCR, we have other things that need to run on the other cores. |
What evidence is your memory management speculation based on? |
I'm not speculating anything. The reality is that TesseRact takes more than 3 seconds to read the above image that I initially attached (I use VS2010). When I use the console test application that comes with the TesseRact, it takes about the same time (more than 3 seconds). Anyone would speculate a lot in 3 seconds I have more than 20 years in machine vision. I used several OCR engines in the past. Actually I have one -in house- that reads the same image in less than 100ms, but our engine is designed more for reading a single line of text (i.e. it returns a single line of text). TesseRact database is not that large. Most of the techniques used by TesseRact are quite standard in the OCR-area (page layout, line extraction, possible character extraction, word forming, and then several phases of classification). However, the TesseRact manages very badly memory usage. why? it takes more than 3 seconds to read a typical texted-image. please if you're not bringing any meaningful ideas to my posting, just spare me your comment. |
@ychtioui, as you have spent many years in machine vision, you know quite well that there are lots of ways why programs can be slow. Memory management is just one of them. Even with a lot of experience, I'd start running performance analyzers to investigate performance issues. Of course I can guess what might be possible reasons and try to improve the software based on that guesses, but improvements based on evidence (like the result of a performance analysis) are more efficient. Don't you think so, too? Do you have a chance to run a performance analysis? |
You can try to use 3.02 version if you need only English. AFAIR it was Zdenko On Thu, Mar 10, 2016 at 4:35 PM, younes notifications@github.com wrote:
|
I'm running version 3.02 is it typical to read images (such as mine attached above) in a few seconds? thanks for your comments. |
3.03 and 3.04 are compiled with '-O2' by default. 2.04 and 3.01 are compiled with |
thanks amitdo. |
What I linked to was actually 3.02.02 I think this is 3.02: You are right. It does not contain any '-On' flag, |
I assume you are using Tesseract on Linux / FreeBSD / Mac. On Windows + MS Visual C++ the |
@ychtioui said in a post above "I use VS2010" so using Windows. |
Thanks Shree. I don't know which optimization level is used for Visual C++. |
I use vs2010 on a Windows 7 pc. |
VS2010 use optimization flag /O2 (Maximize speed) - other flags are set to default. I tried to run perf tool on linux:
according this report 3 top function consumed 66% of "time". Then I tried 4 pages (A4 ) tiff (G4 compressed):
Then I tried non eng image:
|
Just for record for possible improvement in this issue: there was interesting information posted in scantailor project: OpenCL alone only brings ~2x speed-up. Another ~6x speed-up comes from multi-threaded processing. |
Hi @ychtioui I am newbie and saw your first comment that you are able to get pretty accurate results from Tesseract. For your image itself i am no table to get any results its telling: Can't recognize image. Can you plz provide the code snippet on how you are processing the image. |
@theraysmith |
I'm interested in the same answer, @amitdo . Can you answer the question, @theraysmith ? It really can help us :) |
Don't expect much difference between |
That is a surprisingly hard question to answer in the Google environment!
I use 'opt' mode which after some digging, I found maps to -O2.
In addition, explicitly added are:
-fopenmp which will deliver a major improvement (3x faster), if you do not
have it, and a corresponding -lgomp for the linker
arch/dotproductavx.cpp is compiled with -mavx
arch/dotproductsse.cpp (and actually all the rest of the code) is compiled
with -msse4.1
I thought all this stuff was in the autotools files already, or are you
looking to convert these to windows?
…On Sat, Apr 8, 2017 at 10:50 AM, Stefan Weil ***@***.***> wrote:
Don't expect much difference between -O2 and -O3. I tried different
optimizations, and they only have small effects on the time needed for OCR
of a page. Higher optimization levels can even result in slower code
because the code gets larger (because of unfolding of loops), so CPU caches
become less effective. It is much more important to write good code.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#263 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AL056Qbi9xKk5GXQtfgXVZajN10mksEUks5rt8j6gaJpZM4Ht19x>
.
--
Ray.
|
The improvement by using For mass OCR, it does not help. If many pages have to be processed, it is better to use single threaded Tesseract and run several Tesseract processes in parallel. |
Stefan, what about using OpenMP for training? |
Yes, for training a single new model OpenMP could perhaps speed up the training process. Up to now, OpenMP is only used in |
OpenMP speeds up training by about 3.5x, since it runs 4 threads (one for
each part of the LSTM) and spends >90% of CPU time computing the LSTM
forward/backward.
…On Sat, Apr 15, 2017 at 7:11 AM, Stefan Weil ***@***.***> wrote:
Yes, for training a single new model OpenMP could perhaps speed up the
training process. Up to now, OpenMP is only used in ccmain/ and in lstm/.
I don't know how much that part is used during training, and I never have
run a performance evaluation for the training process (in fact I have only
run LSTM training once for Fraktur, and as I already said, it was not
really successful).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#263 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AL056QxUeSroEmcJmZ30om3_wi6Mlyu5ks5rwNAogaJpZM4Ht19x>
.
--
Ray.
|
can I set more than 4 threads for Trainning LSTM? |
No, it doesn't help. The parallelism is limited by the implementation of
the LSTM as 4 matrix-vector products.
When I experimented with more threads for some of the other operations (eg
the output softmax), it slowed down because the cache coherency was lost.
I also experimented with breaking the matrix-vector products up further (eg
splitting the input from the recurrent part), but openMP doesn't seem too
good at allocating the threads in a way that keeps the cache coherency.
Each thread needs to run the same part of the weights matrix for each
timestep, and that is difficult to achieve with the recurrent nature of the
LSTM.
…On Tue, Apr 18, 2017 at 11:09 PM, xlight ***@***.***> wrote:
can I set more than 4 threads for Trainning LSTM?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#263 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AL056UAEnzbmZZ5vncaO2zr0ASll1IoCks5rxaUjgaJpZM4Ht19x>
.
--
Ray.
|
What about machines that have only 2 cores? |
It still works. It just takes longer.
…On Wed, Apr 19, 2017 at 10:00 AM, Amit D. ***@***.***> wrote:
What about machine that have only 2 cores?
Shouldn't the 'num_threads' lowered to 2 in that case?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#263 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AL056dmv_0xhpF-2Qt11PJbfyg5Z-Bepks5rxj26gaJpZM4Ht19x>
.
--
Ray.
|
The Linux kernel and kernel parameters also have a significant effect on the performance of Tesseract (both for recognition and training). Especially the first kernels which tried to fix Spectre and similar CPU bugs make it really slow. I recently noticed that Tesseract with Debian GNU Linux (testing / bullseye) is faster when running in the Linux subsystem for Windows. Running on a Linux kernel with the default settings is slightly slower than running on the Windows kernel. With the kernel parameters from https://make-linux-fast-again.com/ Tesseract gets faster by about 10 to 20 % and is then faster than in the Linux subsystem for Windows. |
@zdenop How to achieve AVX, AVX2, FMA or SSE optimization. |
It is used automatically if your computer provides them. |
For texts without inverted text, significant faster OCR is possible when |
Is it possible to set |
It's a runtime option:
|
Are you aware of whether or not the pytesseract has that option available? |
I'm not familiar with pytesseract. |
The answer is on the pytesseract homepage:
|
Is there any way to use tesseract via multi threading on android project? |
I managed to get faster results by upgrading Tesseract from 4.x to 5.x (can't remember the exact versions) |
Tesseract 5.0.0 should be faster than 4.1.x. @zdenop, can you update your benchmarks above? For the tessdata model, you can add two tests using just one of the ocr engines. test 1: oem 0 (legacy only), test 2: oem 1 (lstm only). |
Timing test with
So with a recent Linux kernel "optimized" kernel options no longer seem to have an effect on the performance.
Using
|
For OpenMP, you can try to limit the number of threads it uses to Edit: With your CPU, you can try to limit it to a small numbers of threads, let say 3, and then increase/decrease the number of threads. |
The test was running on a CPU with 24 cores. Using more than one core always produces a huge waste of CPU time.
|
Using more than 1 CPU in the same address space has always coordination overhead and more than ~4 is a complete waste. Boxes with 24 CPUs are more made to run VMs on it. Something like 2 x 6C/6T serving 24 VMs and 400 websites works (with disk IO as the bottleneck). For task with 100% CPU I would first profile them to find hotspots or low hanging fruits. Maybe change to the much faster TensorFlow. Are there benchmarks, how much faster Tensorflow is? Tuning code itself is more time consuming and in case of well crafted code you can get maybe something in the range of 10%. |
@amitdo : what about creating wiki related to speed? IMO it would be more appropriate than discussing/updating 5 years old thread... |
Wiki page or a page in tessdoc?
|
I started https://github.com/tesseract-ocr/tessdoc/blob/main/Benchmarks.md Still missing several tests (4.1.3 with AWX, |
Thanks Zdenko. |
Conclusions:
|
Same here. After updating to Ubuntu 22.04, gImageReader became incredibly slow for me. Dev manisandro was very helpful and led me to a quick and dirty cli solution for running on a single thread. 'Works wonderfully for me.
If you want to check that you actually are running on one thread, type:
Then run gImageReader:
Et voilà :o) |
I integrated Tesseract C/C++, version 3.x, to read English OCR on images.
It’s working pretty good, but very slow. It takes close to 1000ms (1 second) to read the attached image (00060.jpg) on my quad-core laptop.
I’m not using the Cube engine, and I’m feeding only binary images to the OCR reader.
Any way to make it faster. Any ideas on how to make Tesseract read faster?
thanks
The text was updated successfully, but these errors were encountered: