good accuracy but too slow, how to improve Tesseract speed #263

ychtioui · 2016-03-10T15:35:15Z

I integrated Tesseract C/C++, version 3.x, to read English OCR on images.

It’s working pretty good, but very slow. It takes close to 1000ms (1 second) to read the attached image (00060.jpg) on my quad-core laptop.

I’m not using the Cube engine, and I’m feeding only binary images to the OCR reader.

Any way to make it faster. Any ideas on how to make Tesseract read faster?
thanks

stweil · 2016-03-10T16:10:06Z

You can already run 4 parallel instances of Tesseract on your quad core, then it will read 4 images in about the same time. Introducing multi threading would not help to reduce the time needed for an OCR of many images. I am working on a project where OCR with Tesseract would take nearly 7 years on a single core, but luckily I can try to get many computers and use their cores, so the time can be reduced to a few days.
Using compiler settings which are optimized for your CPU helps to gain a few percent, but I am afraid that for a larger gain different algorithms in Tesseract and its libraries would be needed.

ychtioui · 2016-03-10T18:57:56Z

Besides the OCR, we have other things that need to run on the other cores.
I believe, the main issue that's slowing down Tesseract is the way memory is managed.
Too many memory allocations (new function) and releases (delete or delete [] functions) do slow down the reader.
In the past, I did use a different OCR engine, and it was allocating up-front large buffers to store all the needed data (large buffer of blobs, a large buffer of lines, a large buffer of words and their corresponding data), the buffers were just being indexed as we were reading the data from an image. The large buffers were allocated only once upon ocr engine initialization and release only once upon ocr engine shutdown. This memory management scheme was very efficient computational-time-wise.
Are there any settings for Tesseract that are known to be computationally intensive?
any tricks to speed up Tesseract?

tfmorris · 2016-03-10T19:49:03Z

What evidence is your memory management speculation based on?

ychtioui · 2016-03-10T22:10:10Z

I'm not speculating anything. The reality is that TesseRact takes more than 3 seconds to read the above image that I initially attached (I use VS2010). When I use the console test application that comes with the TesseRact, it takes about the same time (more than 3 seconds).

Anyone would speculate a lot in 3 seconds

I have more than 20 years in machine vision. I used several OCR engines in the past. Actually I have one -in house- that reads the same image in less than 100ms, but our engine is designed more for reading a single line of text (i.e. it returns a single line of text).

TesseRact database is not that large. Most of the techniques used by TesseRact are quite standard in the OCR-area (page layout, line extraction, possible character extraction, word forming, and then several phases of classification). However, the TesseRact manages very badly memory usage. why? it takes more than 3 seconds to read a typical texted-image.

please if you're not bringing any meaningful ideas to my posting, just spare me your comment.

stweil · 2016-03-11T06:18:48Z

@ychtioui, as you have spent many years in machine vision, you know quite well that there are lots of ways why programs can be slow. Memory management is just one of them. Even with a lot of experience, I'd start running performance analyzers to investigate performance issues. Of course I can guess what might be possible reasons and try to improve the software based on that guesses, but improvements based on evidence (like the result of a performance analysis) are more efficient. Don't you think so, too? Do you have a chance to run a performance analysis?

zdenop · 2016-03-11T10:25:11Z

You can try to use 3.02 version if you need only English. AFAIR it was
singnificantly faster on my (old) computer.

Zdenko

On Thu, Mar 10, 2016 at 4:35 PM, younes notifications@github.com wrote:

I integrated Tesseract C/C++, version 3.x, to read English OCR on images.

It’s working pretty good, but very slow. It takes close to 1000ms (1
second) to read the attached image (00060.jpg) on my quad-core laptop.

I’m not using the Cube engine, and I’m feeding only binary images to the
OCR reader.

Any way to make it faster. Any ideas on how to make Tesseract read faster?
thanks
[image: 00060]
https://cloud.githubusercontent.com/assets/9968625/13674495/ac261db4-e6ab-11e5-9b4a-ad91d5b4ff87.jpg

—
Reply to this email directly or view it on GitHub
#263.

ychtioui · 2016-03-14T13:58:35Z

I'm running version 3.02
I'm going through different sections of the reader, and checking which section is taking the most time.

is it typical to read images (such as mine attached above) in a few seconds?

thanks for your comments.

amitdo · 2016-03-18T17:32:26Z

... 3.02 version ... AFAIR it was significantly faster on my (old) computer.

~~3.02~~ 3.02.02 is compiled with '-O3' by default.
https://github.com/tesseract-ocr/tesseract/blob/3.02.02/configure.ac#L161

3.03 and 3.04 are compiled with '-O2' by default.
https://github.com/tesseract-ocr/tesseract/blob/3.03-rc1/configure.ac#L201
https://github.com/tesseract-ocr/tesseract/blob/3.04.01/configure.ac#L300

2.04 and 3.01 are compiled with ~~'-O0'~~ '-O2' by default.
https://github.com/tesseract-ocr/tesseract/blob/2.04/configure.ac
https://github.com/tesseract-ocr/tesseract/blob/3.01/configure.ac
The 'configure.ac' script in these versions does not explicitly set the '-O' level, so autotools will use ~~'-O0'~~ '-O2' as default.

ychtioui · 2016-03-18T17:59:59Z

thanks amitdo.
I'm using 3.02 but the C/C++ version of Tesseract.
I couldn't find the setting -O3 in the source files. where is it?

amitdo · 2016-03-18T20:49:26Z

What I linked to was actually 3.02.02

I think this is 3.02:
https://github.com/tesseract-ocr/tesseract/blob/d581ab7e12a2fac4a73ac0af4ce7ec522b8f3e42/configure.ac

You are right. It does not contain any '-On' flag, ~~so the compiler will use '-O0', which is not good for speed.~~ so if you are using autotools to build Tesseract it will instruct the compiler to use '-O2'.

amitdo · 2016-03-18T21:13:43Z

I assume you are using Tesseract on Linux / FreeBSD / Mac. On Windows + MS Visual C++ the configure.ac file is irrelevant.

Shreeshrii · 2016-03-19T08:38:59Z

@ychtioui said in a post above "I use VS2010" so using Windows.

amitdo · 2016-03-19T09:18:35Z

Thanks Shree.

I don't know which optimization level is used for Visual C++.

ychtioui · 2016-03-19T14:44:40Z

I use vs2010 on a Windows 7 pc.
Project settings or building options won't change much the read speed.
Tesseract was designed in research labs. Most of the key sections of the reader are speed-don't-care.
I used some performance tools to analyze where most of the computation time is spent.
In the page layout section, the blob analyzer does a lot of new/delete. This is very time consuming. The attached image above has more than 3600 blobs. Besides a number of processings are done on each blob (distance transform, finding the enclosing rectangle, measuring blob parameters, etc.). The allocations (new) and the release (delete) of all these blobs is very time consuming.
If we use a global array (allocate upfront) of blobs (exactly object BLOBNBOX) and whenever we need a blob, just get one index from the array. The array will be released once when we shut down the engine.
I used this concept in another single line ocr reader and it's super fast.

zdenop · 2016-03-19T16:56:11Z

VS2010 use optimization flag /O2 (Maximize speed) - other flags are set to default.
In past in forum there were warnings against using compiler optimization flag as they affect also OCR results. This is reason why there are standard optimization flags (-O2 in autotools and /O2 in VS).

I tried to run perf tool on linux:
perf record tesseract eurotext.tif eurotext
and I got this report (perf report):

  39,77%  tesseract  libtesseract.so.3.0.4  [.] tesseract::SquishedDawg::edge_char_of
  13,98%  tesseract  libtesseract.so.3.0.4  [.] tesseract::Classify::ComputeCharNormArrays
  13,09%  tesseract  libtesseract.so.3.0.4  [.] IntegerMatcher::UpdateTablesForFeature
   4,22%  tesseract  libtesseract.so.3.0.4  [.] tesseract::Classify::PruneClasses
   2,66%  tesseract  libtesseract.so.3.0.4  [.] ScratchEvidence::UpdateSumOfProtoEvidences
   1,48%  tesseract  libtesseract.so.3.0.4  [.] ELIST_ITERATOR::forward
   1,16%  tesseract  libc-2.19.so           [.] _int_malloc
   1,15%  tesseract  libtesseract.so.3.0.4  [.] tesseract::ShapeTable::MaxNumUnichars
   1,01%  tesseract  libtesseract.so.3.0.4  [.] tesseract::Classify::ExpandShapesAndApplyCorrections
   0,87%  tesseract  liblept.so.5.0.0       [.] rasteropLow
   0,79%  tesseract  libm-2.19.so           [.] __mul
   0,72%  tesseract  libtesseract.so.3.0.4  [.] FPCUTPT::assign
   0,71%  tesseract  libc-2.19.so           [.] _int_free
   0,71%  tesseract  libtesseract.so.3.0.4  [.] ELIST::add_sorted_and_find
   0,61%  tesseract  libtesseract.so.3.0.4  [.] tesseract::AmbigSpec::compare_ambig_specs
   0,57%  tesseract  libtesseract.so.3.0.4  [.] tesseract::Classify::ComputeNormMatch
   0,52%  tesseract  libc-2.19.so           [.] memset
   0,49%  tesseract  libc-2.19.so           [.] vfprintf
   0,45%  tesseract  libc-2.19.so           [.] malloc
   0,36%  tesseract  libtesseract.so.3.0.4  [.] SegmentLLSQ
   0,31%  tesseract  libm-2.19.so           [.] __ieee754_atan2_sse2
   0,31%  tesseract  libc-2.19.so           [.] malloc_consolidate
   0,30%  tesseract  libtesseract.so.3.0.4  [.] LLSQ::add
   0,29%  tesseract  libtesseract.so.3.0.4  [.] GenericVector<tesseract::ScoredFont>::operator+=
   0,29%  tesseract  libtesseract.so.3.0.4  [.] _ZN14ELIST_ITERATOR7forwardEv@plt
   0,28%  tesseract  libtesseract.so.3.0.4  [.] tesseract::ComputeFeatures
   0,25%  tesseract  liblept.so.5.0.0       [.] pixScanForForeground
   0,24%  tesseract  libtesseract.so.3.0.4  [.] GenericVector<tesseract::ScoredFont>::reserve
   0,20%  tesseract  libtesseract.so.3.0.4  [.] C_OUTLINE::increment_step
   0,20%  tesseract  [kernel.kallsyms]      [k] clear_page

according this report 3 top function consumed 66% of "time".

Then I tried 4 pages (A4 ) tiff (G4 compressed):

  52,24%  tesseract  libtesseract.so.3.0.4  [.] tesseract::SquishedDawg::edge_char_of
  12,06%  tesseract  libtesseract.so.3.0.4  [.] tesseract::Classify::ComputeCharNormArrays
  10,06%  tesseract  libtesseract.so.3.0.4  [.] IntegerMatcher::UpdateTablesForFeature
   3,57%  tesseract  libtesseract.so.3.0.4  [.] tesseract::Classify::PruneClasses
   1,90%  tesseract  libtesseract.so.3.0.4  [.] ScratchEvidence::UpdateSumOfProtoEvidences
...

Then I tried non eng image: perf record tesseract hebrew.png hebrew -l heb:

  27,79%  tesseract  libtesseract.so.3.0.4  [.] IntegerMatcher::UpdateTablesForFeature
  27,34%  tesseract  libtesseract.so.3.0.4  [.] tesseract::Classify::ComputeCharNormArrays
   4,40%  tesseract  libtesseract.so.3.0.4  [.] tesseract::Classify::PruneClasses
   3,98%  tesseract  libtesseract.so.3.0.4  [.] ScratchEvidence::UpdateSumOfProtoEvidences
   3,05%  tesseract  libtesseract.so.3.0.4  [.] tesseract::Classify::ComputeNormMatch
   2,36%  tesseract  libtesseract.so.3.0.4  [.] tesseract::ShapeTable::MaxNumUnichars
   2,05%  tesseract  libtesseract.so.3.0.4  [.] tesseract::Classify::ExpandShapesAndApplyCorrections
...

zdenop · 2016-09-13T20:09:46Z

Just for record for possible improvement in this issue: there was interesting information posted in scantailor project: OpenCL alone only brings ~2x speed-up. Another ~6x speed-up comes from multi-threaded processing.

anant-pathak · 2016-10-21T11:57:23Z

Hi @ychtioui I am newbie and saw your first comment that you are able to get pretty accurate results from Tesseract. For your image itself i am no table to get any results its telling: Can't recognize image. Can you plz provide the code snippet on how you are processing the image.
Thanks - Anant.

amitdo · 2016-11-28T10:54:58Z

@theraysmith
What do you use in the internal Google build, -O2 or -O3?

paladini · 2017-04-08T13:15:58Z

I'm interested in the same answer, @amitdo . Can you answer the question, @theraysmith ? It really can help us :)

stweil · 2017-04-08T17:50:37Z

Don't expect much difference between -O2 and -O3. I tried different optimizations, and they only have small effects on the time needed for OCR of a page. Higher optimization levels can even result in slower code because the code gets larger (because of unfolding of loops), so CPU caches become less effective. It is much more important to write good code.

theraysmith · 2017-04-14T23:24:10Z

That is a surprisingly hard question to answer in the Google environment! I use 'opt' mode which after some digging, I found maps to -O2. In addition, explicitly added are: -fopenmp which will deliver a major improvement (3x faster), if you do not have it, and a corresponding -lgomp for the linker arch/dotproductavx.cpp is compiled with -mavx arch/dotproductsse.cpp (and actually all the rest of the code) is compiled with -msse4.1 I thought all this stuff was in the autotools files already, or are you looking to convert these to windows?

…

On Sat, Apr 8, 2017 at 10:50 AM, Stefan Weil ***@***.***> wrote: Don't expect much difference between -O2 and -O3. I tried different optimizations, and they only have small effects on the time needed for OCR of a page. Higher optimization levels can even result in slower code because the code gets larger (because of unfolding of loops), so CPU caches become less effective. It is much more important to write good code. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#263 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AL056Qbi9xKk5GXQtfgXVZajN10mksEUks5rt8j6gaJpZM4Ht19x> .

-- Ray.

stweil · 2017-04-15T06:24:49Z

The improvement by using -fopenmp is useful when you want "realtime" OCR – running OCR for a single page and waiting for the result. Then it is fast because it uses more than one CPU core for some time consuming parts of the OCR process.

For mass OCR, it does not help. If many pages have to be processed, it is better to use single threaded Tesseract and run several Tesseract processes in parallel.

amitdo · 2017-04-15T10:17:54Z

Stefan, what about using OpenMP for training?

stweil · 2017-04-15T14:11:41Z

Yes, for training a single new model OpenMP could perhaps speed up the training process. Up to now, OpenMP is only used in ccmain/ and in lstm/. I don't know how much that part is used during training, and I never have run a performance evaluation for the training process (in fact I‌ have only run LSTM training once for Fraktur, and as I already said, it was not really successful).

theraysmith · 2017-04-17T23:32:15Z

OpenMP speeds up training by about 3.5x, since it runs 4 threads (one for each part of the LSTM) and spends >90% of CPU time computing the LSTM forward/backward.

…

On Sat, Apr 15, 2017 at 7:11 AM, Stefan Weil ***@***.***> wrote: Yes, for training a single new model OpenMP could perhaps speed up the training process. Up to now, OpenMP is only used in ccmain/ and in lstm/. I don't know how much that part is used during training, and I never have run a performance evaluation for the training process (in fact I‌ have only run LSTM training once for Fraktur, and as I already said, it was not really successful). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#263 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AL056QxUeSroEmcJmZ30om3_wi6Mlyu5ks5rwNAogaJpZM4Ht19x> .

-- Ray.

xlight · 2017-04-19T06:09:26Z

can I set more than 4 threads for Trainning LSTM?

theraysmith · 2017-04-19T16:19:49Z

No, it doesn't help. The parallelism is limited by the implementation of the LSTM as 4 matrix-vector products. When I experimented with more threads for some of the other operations (eg the output softmax), it slowed down because the cache coherency was lost. I also experimented with breaking the matrix-vector products up further (eg splitting the input from the recurrent part), but openMP doesn't seem too good at allocating the threads in a way that keeps the cache coherency. Each thread needs to run the same part of the weights matrix for each timestep, and that is difficult to achieve with the recurrent nature of the LSTM.

…

On Tue, Apr 18, 2017 at 11:09 PM, xlight ***@***.***> wrote: can I set more than 4 threads for Trainning LSTM? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#263 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AL056UAEnzbmZZ5vncaO2zr0ASll1IoCks5rxaUjgaJpZM4Ht19x> .

-- Ray.

amitdo · 2017-04-19T17:00:27Z

What about machines that have only 2 cores?
Shouldn't the 'num_threads' lowered to 2 in that case?

theraysmith · 2017-04-19T17:23:58Z

It still works. It just takes longer.

…

On Wed, Apr 19, 2017 at 10:00 AM, Amit D. ***@***.***> wrote: What about machine that have only 2 cores? Shouldn't the 'num_threads' lowered to 2 in that case? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#263 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AL056dmv_0xhpF-2Qt11PJbfyg5Z-Bepks5rxj26gaJpZM4Ht19x> .

-- Ray.

stweil · 2020-01-25T11:23:30Z

The Linux kernel and kernel parameters also have a significant effect on the performance of Tesseract (both for recognition and training). Especially the first kernels which tried to fix Spectre and similar CPU bugs make it really slow. I recently noticed that Tesseract with Debian GNU Linux (testing / bullseye) is faster when running in the Linux subsystem for Windows. Running on a Linux kernel with the default settings is slightly slower than running on the Windows kernel.

With the kernel parameters from https://make-linux-fast-again.com/ Tesseract gets faster by about 10 to 20 % and is then faster than in the Linux subsystem for Windows.

PratapMehra · 2020-05-16T16:59:51Z

@zdenop How to achieve AVX, AVX2, FMA or SSE optimization.

stweil · 2020-05-16T17:06:26Z

It is used automatically if your computer provides them.

stweil · 2020-05-17T14:48:50Z

For texts without inverted text, significant faster OCR is possible when tesseract is called with -c tessedit_do_invert=0, see timing results above.

ViniciusLelis · 2020-05-21T22:25:33Z

Is it possible to set -c tessedit_do_invert=0 in runtime or do we need to build Tesseract with this option?

amitdo · 2020-05-21T22:29:30Z

It's a runtime option:

tesseract in.png out -c tessedit_do_invert=0

ViniciusLelis · 2020-05-21T22:32:00Z

Are you aware of whether or not the pytesseract has that option available?

amitdo · 2020-05-21T23:02:44Z

I'm not familiar with pytesseract.

stweil · 2020-05-22T04:53:52Z

Are you aware of whether or not the pytesseract has that option available?

The answer is on the pytesseract homepage:

config String - Any additional custom configuration flags that are not available via the pytesseract function. For example: config='--psm 6'

skydev66 · 2021-05-10T13:36:30Z

Is there any way to use tesseract via multi threading on android project?

ViniciusLelis · 2021-06-30T13:59:48Z

I managed to get faster results by upgrading Tesseract from 4.x to 5.x (can't remember the exact versions)
Also found out that our production servers were using 32bit, so we installed the 64bits version instead.
Time to analysis went from 20+ seconds to 7~10 which is perfectly acceptable since we also added 2 more servers.

amitdo · 2021-12-26T08:09:50Z

Tesseract 5.0.0 should be faster than 4.1.x.

@zdenop, can you update your benchmarks above?

For the tessdata model, you can add two tests using just one of the ocr engines. test 1: oem 0 (legacy only), test 2: oem 1 (lstm only).

stweil · 2021-12-26T09:37:47Z

Timing test with lstm_squashed_test on Debian bullseye, AMD EPYC 7413, Tesseract Git main, -O2:

# clang, default kernel options, configure --disable-shared --disable-openmp
[       OK ] LSTMTrainerTest.TestSquashed (22778 ms)
[       OK ] LSTMTrainerTest.TestSquashed (22764 ms)

# g++, default kernel options, configure --disable-shared --disable-openmp
[       OK ] LSTMTrainerTest.TestSquashed (23722 ms)
[       OK ] LSTMTrainerTest.TestSquashed (23739 ms)

# clang, kernel options https://make-linux-fast-again.com/, configure --disable-shared --disable-openmp
[       OK ] LSTMTrainerTest.TestSquashed (22984 ms)
[       OK ] LSTMTrainerTest.TestSquashed (23062 ms)

# g++, kernel options https://make-linux-fast-again.com/, configure --disable-shared --disable-openmp
[       OK ] LSTMTrainerTest.TestSquashed (23834 ms)
[       OK ] LSTMTrainerTest.TestSquashed (23708 ms)

# clang, kernel options https://make-linux-fast-again.com/, configure --disable-shared
[       OK ] LSTMTrainerTest.TestSquashed (22844 ms)
[       OK ] LSTMTrainerTest.TestSquashed (22963 ms)

So with a recent Linux kernel "optimized" kernel options no longer seem to have an effect on the performance.
Nor does OpenMP make that training test faster. It even has a huge negative effect because it consumes much CPU time:

# clang, kernel options https://make-linux-fast-again.com/, configure --disable-shared --disable-openmp
time ./lstm_squashed_test
[...]
real	0m23.114s
user	0m23.049s
sys	0m0.064s

# clang, kernel options https://make-linux-fast-again.com/, configure --disable-shared
time ./lstm_squashed_test
[...]
real	0m22.972s
user	1m31.495s
sys	0m0.308s

Using -O3 has no effect im my test, but adding -ffast-mathincreases the performance further:

# clang, configure --disable-shared --disable-openmp
[       OK ] LSTMTrainerTest.TestSquashed (21793 ms)

amitdo · 2021-12-26T20:53:03Z

For OpenMP, you can try to limit the number of threads it uses to n_cpu_cores-1.

Edit: With your CPU, you can try to limit it to a small numbers of threads, let say 3, and then increase/decrease the number of threads.

stweil · 2021-12-26T22:14:52Z

The test was running on a CPU with 24 cores. Using more than one core always produces a huge waste of CPU time.

# OMP_THREAD_LIMIT=1
real	0m25.105s
user	0m25.048s
sys	0m0.056s

# OMP_THREAD_LIMIT=2
real	0m25.637s
user	0m51.032s
sys	0m0.188s

# OMP_THREAD_LIMIT=3
real	0m23.279s
user	1m9.493s
sys	0m0.288s

# OMP_THREAD_LIMIT=4 or larger
real	0m23.008s
user	1m31.521s
sys	0m0.348s

wollmers · 2021-12-27T08:34:18Z

Using more than 1 CPU in the same address space has always coordination overhead and more than ~4 is a complete waste. Boxes with 24 CPUs are more made to run VMs on it. Something like 2 x 6C/6T serving 24 VMs and 400 websites works (with disk IO as the bottleneck).

For task with 100% CPU I would first profile them to find hotspots or low hanging fruits. Maybe change to the much faster TensorFlow. Are there benchmarks, how much faster Tensorflow is?

Tuning code itself is more time consuming and in case of well crafted code you can get maybe something in the range of 10%.

zdenop · 2021-12-27T15:48:58Z

@amitdo : what about creating wiki related to speed? IMO it would be more appropriate than discussing/updating 5 years old thread...

amitdo · 2021-12-27T17:07:44Z

@zdenop,

Wiki page or a page in tessdoc?

Benchmarks ?
Performace comparison ?

zdenop · 2022-01-09T19:53:15Z

I started https://github.com/tesseract-ocr/tessdoc/blob/main/Benchmarks.md

Still missing several tests (4.1.3 with AWX, -c tessedit_do_invert=0, maybe different OEM, OCR quality...)

amitdo · 2022-01-09T21:31:28Z

Thanks Zdenko.

amitdo · 2022-01-09T21:58:27Z

Conclusions:

OpenMP in Tesseract is very inefficient.
Text recognition: 5.01 using a fast LSTM model with a CPU that supports AVX2 and without OpenMP is faster than 3.05 which uses the legacy engine.

Freredaran · 2023-02-18T17:04:16Z

@stweil

If many pages have to be processed, it is better to use single threaded Tesseract and run several Tesseract processes in parallel.

Same here. After updating to Ubuntu 22.04, gImageReader became incredibly slow for me. Dev manisandro was very helpful and led me to a quick and dirty cli solution for running on a single thread. 'Works wonderfully for me.
In a terminal, type:

export OMP_THREAD_LIMIT=1

If you want to check that you actually are running on one thread, type:

echo $OMP_THREAD_LIMIT

Then run gImageReader:

gimagereader-gtk

Et voilà :o)

amitdo added the feature request label Jun 5, 2016

bradosia mentioned this issue Mar 23, 2020

This library is slower than linux bradosia/mingw-w64-x86_64-static-tesseract#2

Open

amitdo added OpenMP SIMD labels May 14, 2020

zdenop mentioned this issue Jan 6, 2021

RFC: Situation with tests in Tesseract #1627

Closed

SeanPedersen mentioned this issue Jan 7, 2021

Improve OCR performance Ravn-Tech/HyperTag#33

Open

bertsky mentioned this issue Jun 6, 2021

Disable OpenMP tesseract-ocr/tesstrain#259

Open

amitdo closed this as completed Jan 9, 2022

davidbernard04 mentioned this issue Feb 1, 2022

Tesseract OCR is excessively long on quad-core CPU davidbernard04/roc-alliance-mgmt#3

Closed

good accuracy but too slow, how to improve Tesseract speed #263

good accuracy but too slow, how to improve Tesseract speed #263

Comments

ychtioui commented Mar 10, 2016

stweil commented Mar 10, 2016

ychtioui commented Mar 10, 2016

tfmorris commented Mar 10, 2016

ychtioui commented Mar 10, 2016

stweil commented Mar 11, 2016

zdenop commented Mar 11, 2016

ychtioui commented Mar 14, 2016

amitdo commented Mar 18, 2016 • edited Loading

ychtioui commented Mar 18, 2016

amitdo commented Mar 18, 2016 • edited Loading

amitdo commented Mar 18, 2016

Shreeshrii commented Mar 19, 2016

amitdo commented Mar 19, 2016

ychtioui commented Mar 19, 2016

zdenop commented Mar 19, 2016

zdenop commented Sep 13, 2016

anant-pathak commented Oct 21, 2016

amitdo commented Nov 28, 2016

paladini commented Apr 8, 2017

stweil commented Apr 8, 2017

theraysmith commented Apr 14, 2017 via email

stweil commented Apr 15, 2017

amitdo commented Apr 15, 2017 • edited Loading

stweil commented Apr 15, 2017

theraysmith commented Apr 17, 2017 via email

xlight commented Apr 19, 2017

theraysmith commented Apr 19, 2017 via email

amitdo commented Apr 19, 2017 • edited Loading

theraysmith commented Apr 19, 2017 via email

stweil commented Jan 25, 2020 • edited Loading

PratapMehra commented May 16, 2020 • edited Loading

stweil commented May 16, 2020

stweil commented May 17, 2020 • edited Loading

ViniciusLelis commented May 21, 2020

amitdo commented May 21, 2020

ViniciusLelis commented May 21, 2020

amitdo commented May 21, 2020

stweil commented May 22, 2020

skydev66 commented May 10, 2021

ViniciusLelis commented Jun 30, 2021

amitdo commented Dec 26, 2021

stweil commented Dec 26, 2021 • edited Loading

amitdo commented Dec 26, 2021 • edited Loading

stweil commented Dec 26, 2021

wollmers commented Dec 27, 2021

zdenop commented Dec 27, 2021

amitdo commented Dec 27, 2021

zdenop commented Jan 9, 2022

amitdo commented Jan 9, 2022

amitdo commented Jan 9, 2022 • edited Loading

Freredaran commented Feb 18, 2023 • edited Loading

amitdo commented Mar 18, 2016 •

edited

Loading

amitdo commented Mar 18, 2016 •

edited

Loading

amitdo commented Apr 15, 2017 •

edited

Loading

amitdo commented Apr 19, 2017 •

edited

Loading

stweil commented Jan 25, 2020 •

edited

Loading

PratapMehra commented May 16, 2020 •

edited

Loading

stweil commented May 17, 2020 •

edited

Loading

stweil commented Dec 26, 2021 •

edited

Loading

amitdo commented Dec 26, 2021 •

edited

Loading

amitdo commented Jan 9, 2022 •

edited

Loading

Freredaran commented Feb 18, 2023 •

edited

Loading