Significant speed drop on Tesseract4 vs 3 with identical image #1278

mattmillus · 2018-01-16T21:05:20Z

Environment

Tesseract Version: 3.51 and 4.00
Commit Number: 10a8a67 (when built from source), also windows binaries from https://github.com/tesseract-ocr/tesseract/wiki/Downloads
Platform: Windows7 64-bit, 32GB of RAM, Intel Core i7-4770K @ 3.50Ghz

Current Behavior:

Using the same input image (attached), Tesseract4 performs over three times slower than Tesseract3. On my system this image took ~5 seconds with Tesseract3, and ~20 seconds with Tesseract4.

Tested using the pre-built 4.00 and 3.51 windows binaries from https://github.com/tesseract-ocr/tesseract/wiki/Downloads, as well as with the latest 4.00 git source I built locally using cppan with MSVC 2015, in Release config

Using the 'fast' version of eng.traineddata did not make a significant difference.

Expected Behavior:

https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance implies we should not be expecting such significantly slower results

EDIT: The following image is one of a handful of test images I am using. Zlib isn't relevant here.

The text was updated successfully, but these errors were encountered:

amitdo · 2018-01-16T22:00:21Z

How zlib is related here?

mattmillus · 2018-01-16T22:15:37Z

It is not related at all, that is simply a well-formed sample image I was using to test the performance. Performance is similar with other images of similar dimensions

amitdo · 2018-01-16T22:42:16Z

Unlike the Linux build, OpenMP and AVX2 are currently not activated in the Windows build.

mattmillus · 2018-01-17T02:02:52Z

Thank you for the info. I added /openmp in Visual Studio as per https://msdn.microsoft.com/en-us/library/fw509c3b.aspx - this nearly doubled the speed and now puts the process at 50% CPU usage. On windows Tesseract3 is still almost twice as fast, but at least this helped!

Shreeshrii · 2018-01-17T03:49:42Z

Also see #898

amitdo · 2018-01-17T07:35:30Z

CC: @egorpugin

jbreiden · 2018-01-17T23:53:26Z

I see wall time of about 4.5 seconds on a six core Xeon E5-1650 running at 3.2 Ghz under Linux. About 3.5 seconds under optimal throughput conditions (repeated OCR of the same image over and over).

Shreeshrii · 2018-01-18T16:00:06Z

Also see https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00#hardware-and-cpu-requirements

On a machine with multiple cores, and AVX, an easy English image may take twice as much real time, and use 7 times the CPU as base Tesseract, whereas Hindi takes more CPU than base Tesseract, but actually runs faster in terms of real time.

amitdo · 2018-01-18T17:08:02Z

#943 (comment)

theraysmith commented on May 23, 2017

*Far greater performance improvements can be made by making the network smaller.* As I already indicated, I have had some very good results in this area, with a network 3x faster than the legacy code (for English) and much faster than the legacy code for complex scripts.

#995 (comment)

theraysmith commented on Jul 12, 2017

2 parallel sets of tessdata. "best" and "fast". "Fast" will exceed the speed of legacy Tesseract in real time, provided you have the required parallelism components, and in total CPU only slightly slower for English. Way faster for most non-latin languages, while being <5% worse than "best".

mattmillus · 2018-01-18T17:14:35Z

I did further testing and the plot thickens a bit.. I ran Tesseract4 with the official datafiles that support the legacy engine (from https://github.com/tesseract-ocr/tessdata) and set EngineMode to TESSERACT so that only the legacy engine is running, not LSTM. I then compared the wall time on a 200 page image file to that of Tesseract 3.02, and found that Tesseract4.00 took over 50% longer (260 seconds vs 160 seconds), despite the fact that its should be using the same engine??

amitdo · 2018-01-18T17:25:28Z

That's another issue.

#263 (comment)
https://groups.google.com/forum/?hl=en#!topic/tesseract-dev/LErriuT-sck

mattmillus · 2018-01-18T17:26:38Z

In ref to the quote above:

2 parallel sets of tessdata. "best" and "fast". "Fast" will exceed the speed of legacy Tesseract in real time, provided you have the required parallelism components, and in total CPU only slightly slower for English.

I have not found the above to be true, at least on windows with English, here are some results:
711 seconds with Tesseract4.00, "fast" data, no /openMP
353 seconds with Tesseract4.00, "fast" data, with /openMP on
480 seconds with Tesseract4.00, "best" data, with /openMP on
260 seconds with Tesseract4.00 "legacy" data and legacy engine, with /openMP on (though it does nothing for legacy engine)
160 seconds with Tesserac3.02

Do I need to manually turn on AVX for windows builds somehow?

mattmillus · 2018-01-18T17:32:47Z

Interesting.. so you think that if I test with Tesseract3.05 it will be about the same speed as 4.00 with legacy engine, and significantly slower than 3.02? If so thats very unfortunate... why such a change if speed with 3.02 vs 3.05 if they the same engine?

amitdo · 2018-01-18T17:43:08Z

tesseract/CMakeLists.txt

Line 210 in a538cd1

if (MSVC)

AVX is turn on but AVX2 is not.

Shreeshrii · 2018-01-18T17:43:14Z

Ray/Jeff need to confirm that the files in tessdata_fast are indeed the final fast models that Ray was referring to.

mattmillus · 2018-01-18T17:46:07Z

If not I would love to test with the final fast models!

amitdo · 2018-01-18T17:50:46Z

The CMake file is incomplete.

jbreiden · 2018-01-18T17:58:50Z

The fast models on Github are final. (To the degree that anything in life is final.)

amitdo · 2018-01-23T14:10:59Z

Pinging @egorpugin again.

egorpugin · 2018-01-23T15:05:36Z

So, do you want /openmp for the whole tess or only for specific files?

amitdo · 2018-01-23T17:35:53Z

With autotools it is global.

egorpugin · 2018-01-23T18:58:19Z

Done. 4b6fefb

amitdo · 2018-01-23T22:20:19Z

Thanks Egor.

Please also add avx2 flag to arch/intsimdmatrixavx2.cpp,
and sse4.1 flag to arch/intsimdmatrixsse.cpp

This is needed to make the use of traineddata from the 'fast' repo actually faster than 'best'.

egorpugin · 2018-01-24T15:54:35Z

Done. 2da95d6
Also note these's an ICE with MSVC in arch/intsimdmatrixavx2.cpp.
I've added simple workaround as in arch/dotproductavx.cpp.
Now I'd like someone to test my changes to check if speedup is actually turned on.

MS issue for reference. Feel free to upvote.
https://developercommunity.visualstudio.com/content/problem/187655/clexe-vs155-ice-with-avx2-flag-on-mm256-extract-ep.html

stweil · 2018-01-24T15:58:09Z

Does anybody know how a GCC based build (for example from UB-Mannheim) compares with the MSVC results?

amitdo · 2018-01-24T17:13:40Z

#1290

amitdo · 2018-01-25T15:16:36Z

Here are the current options to run Tesseract on Windows:

'Native' (Win32 API)
- MSVC
- MinGW-w64
Cygwin (Posix API)
MS WSL (Win 10)
VM like VirtualBox
Docker

And they say that on Linux you have too many options to choose from, which confuses people...
:-)

stweil · 2018-01-25T15:36:37Z

There is also the choice of building 32 or 64 bit software for Windows. Personally I typically built 32 bit Tesseract for Windows up to now, but maybe there might be good reasons for 64 bit code?

innir · 2018-02-24T11:55:38Z

Just to add some information from a Debian bug report, without AVX2 tesseract 4 is much slower than 3.05 ... (See also manisandro/gImageReader#285 for another example of the slowdown)

amitdo · 2018-02-24T12:25:08Z

If someone prefers speed over accuracy, Tesseract 4.00 still includes the legacy ocr engine; Use --oem 0 with the tessdata traineddata.

Shreeshrii · 2018-03-28T13:08:15Z

@zdenop Please label

Performance

Shreeshrii · 2018-10-17T23:48:10Z

Post in forum by David Tran

Server performance is 3x as slow versus local machine 1 post by 1 author

Local machine: 3.50Ghz, 16 GB Ram, Windows 7 64 bit
Server: 2.30Ghz, 32 GB Ram, WindowServer2012 64 bit

tesseract v4.0.0-beta.4.20180912 64 bit

Current Behavior: Processing a 64page PDF (2,733KB) on my local machine takes 286 seconds while on our server it takes a whopping 842 seconds.

Expected Behavior: That it wouldn't be 3x as slow on the server versus on my local machine.

What could be the root cause of the degradation in performance?

stweil · 2018-10-18T05:02:43Z

It looks like the local machine is rather new hardware, while the server is older. So it could be AVX / SSE none at all. The user can run tesseract --version on both machines to see whether SSE and AVX are found.

The number of CPU cores and the memory bandwidth are also very important.
And of course it makes a difference if there are other processes running in parallel on the server.

The user uses the UB Mannheim installer for Windows. He should update to the latest version.

ripefig · 2019-08-09T21:33:01Z

AVX2 and SSE are found but performance is terrible on my machine. It takes over a minute to OCR a single sentence. Tesseract 3 did it about a second, tesseract 4 is not usable on my machine.
danpla/dpscreenocr#2

zdenop · 2019-10-30T14:29:07Z

Closing as duplicate to #263.

asimeonovMLPS · 2021-01-31T20:07:55Z

Hello, i read a line of text (about 8 characters) for about 1.8 seconds. My system is 32 bit Debian 10, I use Tesseract 4.0.0. I want to ask if it is possible under these conditions to speed up the reading?

stweil · 2021-02-01T13:30:06Z

Don't use unsupported old versions. Use Tesseract 4.1.1 or latest Tesseract from git master.

asimeonovMLPS · 2021-02-01T16:02:03Z

Thanks for the reply @stweil . I will try it right away.

asimeonovMLPS · 2021-02-03T07:08:46Z

Hello, I installed version 4.1.1, but every time I call tesseract from the command line or from api, I get the following error: tesseract: symbol lookup error: tesseract: undefined symbol: _ZTVN9tesseract19TessLSTMBoxRendererE . I use the most basic installation process:

./autogen.sh
./configure
make
sudo make install
sudo ldconfig

Any help is highly appreciated. Тhank you.

stweil · 2021-02-03T07:12:28Z

Remove any other installation of Tesseract (for example the distributed package).

asimeonovMLPS · 2021-02-03T12:36:21Z

Thanks for the reply @stweil . I will try it right away.

asimeonovMLPS · 2021-02-05T12:49:50Z

Hi @stweil , you were right it was really a problem with another version of tesseract, but after I managed to install version 4.1.1 there was no difference in execution time.
When I was describing the hardware, I forgot to specify that my processor is a 32-bit ARM. Do I need to do anything extra for ARM architecture?

stweil · 2021-02-05T12:52:52Z

Recent Tesseract releases like 5.0.0-alpha-20201224 or later support 32-bit ARM (ideally with NEON for faster execution). I use it on 32-bit ARM with Ubuntu.

asimeonovMLPS · 2021-02-05T13:03:52Z

Thanks @stweil , my processor supports NEON instructions. I will try it now.

asimeonovMLPS · 2021-02-08T07:34:03Z

Hi @stweil , I installed version tesseract 5.0.0-alpha-20201224, but the time is at most 100 milliseconds faster.
Result from tesseract --version:

tesseract 5.0.0-alpha-20201224
leptonica-1.76.0
libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found NEON
Found OpenMP 201511

From the command above I see Found NEON, but does that mean that I managed to build tesseract to use NEON?
And if I did manage to build tesseract to use NEON, in order to use the neon instructions is it necessary to pass certain parameters to tesseract through the command line?

stweil · 2021-02-08T07:57:27Z

Yes, it uses NEON. I see that you are also using OpenMP. Do you have at least 4 ARM cores? I'd try a build with OpenMP disabled and compare the timing.

asimeonovMLPS · 2021-02-08T08:06:51Z

@stweil I only have two cores.

stweil · 2021-02-08T08:17:08Z

Then OpenMP will waste lots of time with switching between 4 threads. Run configure with --disable-openmp, build and try it again.

asimeonovMLPS · 2021-02-08T08:19:33Z

OK @stweil , thank you.

asimeonovMLPS · 2021-02-08T10:33:41Z

@stweil I just tried but without success the time is the same.
Time to execute a tesseract command:

time tesseract 1_cropp.jpg output 
Tesseract Open Source OCR Engine v5.0.0-alpha-20201224 with Leptonica

real	0m1.854s
user	0m1.690s
sys	0m0.196s

innir mentioned this issue Feb 24, 2018

Slow OCR with tesseract 4.00alpha manisandro/gImageReader#285

Closed

zdenop added the performance label Mar 29, 2018

danpla mentioned this issue Aug 9, 2019

Slow recognition due to multithreading issues in Tesseract 4 and 5 danpla/dpscreenocr#2

Closed

zdenop closed this as completed Oct 30, 2019

zdenop added the duplicate label Oct 30, 2019

bradosia mentioned this issue Mar 23, 2020

This library is slower than linux bradosia/mingw-w64-x86_64-static-tesseract#2

Open

amitdo added OpenMP SIMD labels May 14, 2020

Significant speed drop on Tesseract4 vs 3 with identical image #1278

Significant speed drop on Tesseract4 vs 3 with identical image #1278

Comments

mattmillus commented Jan 16, 2018 • edited Loading

Environment

Current Behavior:

Expected Behavior:

amitdo commented Jan 16, 2018 • edited Loading

mattmillus commented Jan 16, 2018 • edited Loading

amitdo commented Jan 16, 2018

mattmillus commented Jan 17, 2018

Shreeshrii commented Jan 17, 2018

amitdo commented Jan 17, 2018

jbreiden commented Jan 17, 2018 • edited Loading

Shreeshrii commented Jan 18, 2018 • edited Loading

amitdo commented Jan 18, 2018

mattmillus commented Jan 18, 2018

amitdo commented Jan 18, 2018

mattmillus commented Jan 18, 2018

mattmillus commented Jan 18, 2018

amitdo commented Jan 18, 2018 • edited Loading

Shreeshrii commented Jan 18, 2018

mattmillus commented Jan 18, 2018

amitdo commented Jan 18, 2018

jbreiden commented Jan 18, 2018

amitdo commented Jan 23, 2018

egorpugin commented Jan 23, 2018

amitdo commented Jan 23, 2018

egorpugin commented Jan 23, 2018

amitdo commented Jan 23, 2018 • edited Loading

egorpugin commented Jan 24, 2018 • edited Loading

stweil commented Jan 24, 2018 • edited Loading

amitdo commented Jan 24, 2018

amitdo commented Jan 25, 2018 • edited Loading

stweil commented Jan 25, 2018

innir commented Feb 24, 2018 • edited Loading

amitdo commented Feb 24, 2018 • edited Loading

Shreeshrii commented Mar 28, 2018

Shreeshrii commented Oct 17, 2018 • edited Loading

stweil commented Oct 18, 2018

ripefig commented Aug 9, 2019 • edited Loading

zdenop commented Oct 30, 2019

asimeonovMLPS commented Jan 31, 2021

stweil commented Feb 1, 2021

asimeonovMLPS commented Feb 1, 2021

asimeonovMLPS commented Feb 3, 2021

stweil commented Feb 3, 2021

asimeonovMLPS commented Feb 3, 2021

asimeonovMLPS commented Feb 5, 2021

stweil commented Feb 5, 2021

asimeonovMLPS commented Feb 5, 2021

asimeonovMLPS commented Feb 8, 2021

stweil commented Feb 8, 2021

asimeonovMLPS commented Feb 8, 2021

stweil commented Feb 8, 2021

asimeonovMLPS commented Feb 8, 2021

asimeonovMLPS commented Feb 8, 2021

mattmillus commented Jan 16, 2018 •

edited

Loading

amitdo commented Jan 16, 2018 •

edited

Loading

mattmillus commented Jan 16, 2018 •

edited

Loading

jbreiden commented Jan 17, 2018 •

edited

Loading

Shreeshrii commented Jan 18, 2018 •

edited

Loading

amitdo commented Jan 18, 2018 •

edited

Loading

amitdo commented Jan 23, 2018 •

edited

Loading

egorpugin commented Jan 24, 2018 •

edited

Loading

stweil commented Jan 24, 2018 •

edited

Loading

amitdo commented Jan 25, 2018 •

edited

Loading

innir commented Feb 24, 2018 •

edited

Loading

amitdo commented Feb 24, 2018 •

edited

Loading

Shreeshrii commented Oct 17, 2018 •

edited

Loading

ripefig commented Aug 9, 2019 •

edited

Loading