Multiple jobs do not work with Tesseract 4 #31

ashipunov · 2019-02-04T00:49:44Z

I installed today Tesseract 4 from Ubuntu ppa (ppa:alex-p/tesseract-ocr, 4.0.0+git3515-9bcfa90c-1ppa1~xenial1). Tesseract itself works normal, and ocrodjvu also works OK with the default "j=1". However, when I specifed "j=4", ocrodjvu hangs and when I break it, I have the following:

$ time ocrodjvu --in-place -l rus+lat -j 4 nikolaev1970_diat_posjet.djvu 
Processing 'nikolaev1970_diat_posjet.djvu':
- Page #1
- Page #2
- Page #3
- Page #4
- Page #5
- Page #6
^Ctesseract: Tesseract Open Source OCR Engine v4.0.0-288-g9bcf with Leptonica
tesseract: Page 1
tesseract: Detected 105 diacritics
tesseract: Tesseract Open Source OCR Engine v4.0.0-288-g9bcf with Leptonica
tesseract: Page 1
tesseract: Tesseract Open Source OCR Engine v4.0.0-288-g9bcf with Leptonica
tesseract: Page 1
Exception while processing page 3:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 434, in page_thread
    result = self.process_page(page)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 407, in process_page
    result = self._engine.recognize(pfile, language=self._options.language, details=self._options.details, uax29=self._options.uax29)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 296, in recognize
    return f(image, language, details=details, uax29=uax29)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 262, in recognize_hocr
    _wait_for_worker(worker)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 69, in _wait_for_worker
    worker.wait()
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/ipc.py", line 121, in wait
    raise CalledProcessInterrupted(-return_code, self.__command)
CalledProcessInterrupted: Command 'tesseract' was interrupted by signal SIGINT
Exception while processing page 4:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 434, in page_thread
    result = self.process_page(page)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 407, in process_page
    result = self._engine.recognize(pfile, language=self._options.language, details=self._options.details, uax29=self._options.uax29)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 296, in recognize
    return f(image, language, details=details, uax29=uax29)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 262, in recognize_hocr
    _wait_for_worker(worker)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 69, in _wait_for_worker
    worker.wait()
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/ipc.py", line 121, in wait
    raise CalledProcessInterrupted(-return_code, self.__command)
CalledProcessInterrupted: Command 'tesseract' was interrupted by signal SIGINT
tesseract: Tesseract Open Source OCR Engine v4.0.0-288-g9bcf with Leptonica
tesseract: Page 1
Interrupted by user.
Exception while processing page 5:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 434, in page_thread
    result = self.process_page(page)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 407, in process_page
    result = self._engine.recognize(pfile, language=self._options.language, details=self._options.details, uax29=self._options.uax29)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 296, in recognize
    return f(image, language, details=details, uax29=uax29)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 262, in recognize_hocr
    _wait_for_worker(worker)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 69, in _wait_for_worker
    worker.wait()
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/ipc.py", line 121, in wait
    raise CalledProcessInterrupted(-return_code, self.__command)
CalledProcessInterrupted: Command 'tesseract' was interrupted by signal SIGINT
Exception while processing page 6:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 434, in page_thread
    result = self.process_page(page)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 407, in process_page
    result = self._engine.recognize(pfile, language=self._options.language, details=self._options.details, uax29=self._options.uax29)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 296, in recognize
    return f(image, language, details=details, uax29=uax29)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 262, in recognize_hocr
    _wait_for_worker(worker)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 69, in _wait_for_worker
    worker.wait()
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/ipc.py", line 121, in wait
    raise CalledProcessInterrupted(-return_code, self.__command)
CalledProcessInterrupted: Command 'tesseract' was interrupted by signal SIGINT
Intermediate files were left in the '/tmp/ocrodjvu.3_ZmXE' directory.

real	30m20.909s
user	118m6.372s
sys	0m10.420s

I know that there are issues with multi-threading so I used recommendations from

tesseract-ocr/tesseract#898

and from

https://appliedmachinelearning.blog/2018/06/30/performing-ocr-by-running-parallel-instances-of-tesseract-4-0-python/

to set the environment as 'OMP_THREAD_LIMIT=1 tesseract'. However, all my attempts, namely (a) rename executable and replace it with the script, (b) make script which contains the alias and finally (c) change your code to allow this environment variable, failed.

My system info output:

$ inxi
CPU~Dual core Intel Core i7-2620M (-HT-MCP-) speed/max~799/3400 MHz Kernel~4.4.0-141-generic x86_64 Up~4:18 Mem~1518.7/7865.9MB HDD~2000.4GB(30.6% used) Procs~197 Client~Shell inxi~2.2.35

Ocrodjvu version:

$ ocrodjvu --version
ocrodjvu 0.10.4
+ Python 2.7.12
+ subprocess32
+ python-djvulibre 0.7
+ lxml 3.5.0
+ html5lib-python 0.999
+ PyICU 1.9.2
  + ICU 55.1
    + Unicode 7.0

In the end, I reverted everything to Tesseract 3, and now it works. This means, for example, that I cannot OCR books in Armenian and Quechua as these languages for some reason are not in Tesseract 3.

Please help.

The text was updated successfully, but these errors were encountered:

jwilk · 2019-02-05T17:44:21Z

$ time ocrodjvu --in-place -l rus+lat -j 4 nikolaev1970_diat_posjet.djvu

Is this reproducible for you? That is, when you run this command again, does it hang too?

Unfortunately, I wasn't able to reproduce the bug here:

$ time ocrodjvu --in-place -l rus+lat -j 4 nikolaev1970_diat_posjet.djvu
Processing 'nikolaev1970_diat_posjet.djvu':
- Page #1
- Page #2
- Page #3
...
- Page #12

real    1m57.306s
user    2m55.692s
sys     2m29.692s

This was on Ubuntu 16.04 (xenial), same Tesseract and ocrodjvu versions as yours, and higher-end hardware (3 cores of Intel Xeon Gold 6140).

tesseract-ocr/tesseract#898

This is weird. Unless something else is going on, excessive usage of threads shouldn't make things "infinitely slow".

But setting OMP_THREAD_LIMIT is a good idea anyway; I'll try to make ocrodjvu set this automatically for the next release.

In the mean time, you can set this manually:

$ time OMP_THREAD_LIMIT=1 ocrodjvu --in-place -l rus+lat -j 4 nikolaev1970_diat_posjet.djvu
Processing 'nikolaev1970_diat_posjet.djvu':
- Page #1
- Page #2
- Page #4
...
- Page #12

real    0m37.206s
user    1m45.108s
sys     0m2.484s

ashipunov · 2019-02-10T03:58:50Z

Sorry, here am I again with the same issue. First, without OMP_THREAD_LIMIT=1 situation is the same:

$ inxi
CPU~Dual core Intel Core i7-2620M (-HT-MCP-) speed/max~804/3400 MHz Kernel~4.4.0-141-generic x86_64
$ ocrodjvu --version
ocrodjvu 0.10.4
+ Python 2.7.12
+ subprocess32
+ python-djvulibre 0.7
+ lxml 3.5.0
+ html5lib-python 0.999
+ PyICU 1.9.2
  + ICU 55.1
    + Unicode 7.0
$ tesseract --version
tesseract 4.0.0-297-gec8f
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
$ # now:
$ time ocrodjvu --in-place -l rus+lat -j 4 nikolaev1970_diat_posjet.djvu
Processing 'nikolaev1970_diat_posjet.djvu':
- Page #1
- Page #2
- Page #3
- Page #4
- Page #5
- Page #6
^C
...
CalledProcessInterrupted: Command 'tesseract' was interrupted by signal SIGINT
Interrupted by user.
Intermediate files were left in the '/tmp/ocrodjvu.k9gIME' directory.
real	14m34.142s
user	58m3.444s
sys	0m4.336s

However:

$ time OMP_THREAD_LIMIT=1 ocrodjvu --in-place -l rus+lat -j 4 nikolaev1970_diat_posjet.djvu
Processing 'nikolaev1970_diat_posjet.djvu':
- Page #1
- Page #3
...
- Page #12
real	1m29.354s
user	5m24.756s
sys	0m3.328s

Finally, all works with "-j 1" as it should (but, to my surprise, only two times slower then with "-j 4").

Wild idea: does it reflect the difference between Intel Xeon and Intel Core i7? If so, I should try a different machine.

ashipunov · 2019-02-10T04:41:44Z

So another machine (sorry, I do not have Xeons but this is i5 with eight cores):

$  inxi
CPU~Quad core Intel Core i5-8250U (-HT-MCP-) speed/max~938/3400 MHz Kernel~4.15.0-43-generic x86_64
$ time ocrodjvu --in-place -l rus+lat -j 8 nikolaev1970_diat_posjet.djvu 
Processing 'nikolaev1970_diat_posjet.djvu':
- Page #1
- Page #2
- Page #3
...
- Page #10
^C
CalledProcessInterrupted: Command 'tesseract' was interrupted by signal SIGINT
Interrupted by user.
Intermediate files were left in the '/tmp/ocrodjvu.0DU1iF' directory.
real	14m43.246s
user	117m31.202s
sys	0m1.366s
$ # but:
$ time OMP_THREAD_LIMIT=1 ocrodjvu --in-place -l rus+lat -j 8 nikolaev1970_diat_posjet.djvu
Processing 'nikolaev1970_diat_posjet.djvu':
- Page #1
- Page #2
- Page #3 
...
- Page #12
real	0m32.009s
user	3m30.699s
sys	0m3.298s

This is really weird!
I believe that this is a bug, probably associated with Intel i* processors.

jwilk · 2019-02-11T17:11:33Z

Color me baffled. :-/

I've hacked up a script to dump some information about the hanging processes:
examine-hangs.
Hopefully it'll shed some light on what's going on, but I'm not overly optimistic…

I'd like you to do the following:

Disable ptrace restrictions that would prevent GDB from working:

# sysctl kernel.yama.ptrace_scope=0

Install GDB and a bunch of debug packages:

# apt-get install gdb djvulibre-dbg libc6-dbg libgcc1-dbg libgomp1-dbg libstdc++6-5-dbg python-djvu-dbg python-lxml-dbg python2.7-dbg

Run ocrodjvu with -j 4 (without OMP_THREAD_LIMIT) until it hangs.
Run examine-hangs. (It's going to produce copious amount of output on stdout, so it's best to redirect it to a file.)

Send me the file with the output by email, or zip it and attach here.

jwilk · 2019-02-11T19:15:09Z

But setting OMP_THREAD_LIMIT is a good idea anyway; I'll try to make ocrodjvu set this automatically

This was implemented in 0.11.

ashipunov · 2019-02-23T21:51:42Z

Now examine_hangs.sh hangs itself ;) but output something large. I attach ZIP because output is bulky. AS сб, 23 февр. 2019 г. в 15:38, Alexey Shipunov <dactylorhiza@gmail.com>:

…

Sure. Five minutes. AS сб, 23 февр. 2019 г. в 15:37, Jakub Wilk ***@***.***>: > > Yikes, there was a bug in the examination script that broke it almost completely. :-( > I've fixed the in 0ca41df. > Could you try again with the updated script? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub, or mute the thread.

jwilk · 2019-03-01T18:02:25Z

Here's the summary of the report I received from @ashipunov:

There's the ocrodjvu process running (with 10 threads), and 8 tesseract processes (with 4 threads each).

After almost 6 minutes, only the first page is done. All the tesseract processes seem to be consuming CPU:

  PID   LWP S  STARTED     ELAPSED     TIME %CPU   RSZ    VSZ COMMAND
 7493     - - 15:41:18       05:57 00:00:00  0.1 63360 1395120 /usr/bin/python /usr/local/bin/ocrodjvu --in-place -l rus+lat -j 8 nikolaev1970_diat_posjet.djvu
 7516     - - 15:41:18       05:57 00:03:28 58.5 54180 120080 tesseract /tmp/ocrodjvu.a6lLsl/000002.tif /tmp/ocrodjvu.RCQ4D3/tmp -l rus+lat /tmp/ocrodjvu.RCQ4D3/tessconf
 7517     - - 15:41:18       05:57 00:04:55 82.8 72088 137560 tesseract /tmp/ocrodjvu.a6lLsl/000005.tif /tmp/ocrodjvu.VawxBw/tmp -l rus+lat /tmp/ocrodjvu.VawxBw/tessconf
 7518     - - 15:41:18       05:57 00:06:25  107 59708 124680 tesseract /tmp/ocrodjvu.a6lLsl/000007.tif /tmp/ocrodjvu.KOh_L6/tmp -l rus+lat /tmp/ocrodjvu.KOh_L6/tessconf
 7519     - - 15:41:18       05:57 00:06:26  108 73656 138964 tesseract /tmp/ocrodjvu.a6lLsl/000003.tif /tmp/ocrodjvu.e5v4Qh/tmp -l rus+lat /tmp/ocrodjvu.e5v4Qh/tessconf
 7520     - - 15:41:18       05:57 00:06:24  107 70500 135832 tesseract /tmp/ocrodjvu.a6lLsl/000006.tif /tmp/ocrodjvu.1GJ9Kc/tmp -l rus+lat /tmp/ocrodjvu.1GJ9Kc/tessconf
 7521     - - 15:41:18       05:57 00:06:24  107 73040 138592 tesseract /tmp/ocrodjvu.a6lLsl/000004.tif /tmp/ocrodjvu.VgIIXx/tmp -l rus+lat /tmp/ocrodjvu.VgIIXx/tessconf
 7560     - - 15:41:30       05:45 00:06:10  107 70360 135840 tesseract /tmp/ocrodjvu.a6lLsl/000008.tif /tmp/ocrodjvu.96qHYI/tmp -l rus+lat /tmp/ocrodjvu.96qHYI/tessconf
 7581     - - 15:42:32       04:43 00:05:09  109 74156 139876 tesseract /tmp/ocrodjvu.a6lLsl/000009.tif /tmp/ocrodjvu.4XLOur/tmp -l rus+lat /tmp/ocrodjvu.4XLOur/tessconf

Backtraces from ocrodjvu threads look fine:

the main thread:

  Waiting for the GIL
  File "/usr/lib/python2.7/threading.py", line 340, in wait
    waiter.acquire()
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 512, in _process
    condition.wait()
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 549, in process
    self._process(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 567, in main
    context.process(options.path, options.pages)
  File "/usr/local/bin/ocrodjvu", line 7, in <module>
    _.main(sys.argv)

internal python-djvulibre thread:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fdb8cf085f6 in DJVU::GMonitor::wait (this=this@entry=0x1e44af0) at GThreads.cpp:576
#2  0x00007fdb8cf47ec0 in ddjvu_message_wait (ctx=0x1e44ae0) at ddjvuapi.cpp:733
#3  0x00007fdb8c3a7319 in __pyx_pf_4djvu_6decode__Context_message_distributor (__pyx_self=<optimized out>, __pyx_v_kwargs={'sentinel': <object at remote 0x7fdb8f2e4120>}, __pyx_v_self=<optimized out>) at build/temp.linux-x86_64-2.7/src/decode.c:15397
#4  __pyx_pw_4djvu_6decode_1_Context_message_distributor (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at build/temp.linux-x86_64-2.7/src/decode.c:15312
#5  0x00000000004a587e in PyObject_Call () at ../Objects/abstract.c:2546
#6  0x00000000004c5f3d in PyEval_CallObjectWithKeywords () at ../Python/ceval.c:4219
#7  0x0000000000589662 in t_bootstrap () at ../Modules/threadmodule.c:620
#8  0x00007fdb8efd46ba in start_thread (arg=0x7fdb89105700) at pthread_create.c:333
#9  0x00007fdb8ed0a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

8 worker threads:

  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 67, in _wait_for_worker
    stderr = worker.stderr.readlines()
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 262, in recognize_hocr
    _wait_for_worker(worker)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 296, in recognize
    return f(image, language, details=details, uax29=uax29)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 407, in process_page
    result = self._engine.recognize(pfile, language=self._options.language, details=self._options.details, uax29=self._options.uax29)
  File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 434, in page_thread
    result = self.process_page(page)
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
    self.__bootstrap_inner()

There are no backtraces for tesseract processes, because apparently GDB hangs on them. :-(

ashipunov · 2019-03-01T20:13:34Z

So to summarize summary, it is still unclear why... I believe that my hypothesis about processor-related issue might be feasible. пт, 1 мар. 2019 г. в 12:02, Jakub Wilk <notifications@github.com>:

…

Here's the summary: - There's the ocrodjvu process running (with 10 threads), and 8 tesseract processes (with 4 threads each). - After almost 6 minutes, only the first page is done. All the tesseract processes seem to be consuming CPU: PID LWP S STARTED ELAPSED TIME %CPU RSZ VSZ COMMAND 7493 - - 15:41:18 05:57 00:00:00 0.1 63360 1395120 /usr/bin/python /usr/local/bin/ocrodjvu --in-place -l rus+lat -j 8 nikolaev1970_diat_posjet.djvu 7516 - - 15:41:18 05:57 00:03:28 58.5 54180 120080 tesseract /tmp/ocrodjvu.a6lLsl/000002.tif /tmp/ocrodjvu.RCQ4D3/tmp -l rus+lat /tmp/ocrodjvu.RCQ4D3/tessconf 7517 - - 15:41:18 05:57 00:04:55 82.8 72088 137560 tesseract /tmp/ocrodjvu.a6lLsl/000005.tif /tmp/ocrodjvu.VawxBw/tmp -l rus+lat /tmp/ocrodjvu.VawxBw/tessconf 7518 - - 15:41:18 05:57 00:06:25 107 59708 124680 tesseract /tmp/ocrodjvu.a6lLsl/000007.tif /tmp/ocrodjvu.KOh_L6/tmp -l rus+lat /tmp/ocrodjvu.KOh_L6/tessconf 7519 - - 15:41:18 05:57 00:06:26 108 73656 138964 tesseract /tmp/ocrodjvu.a6lLsl/000003.tif /tmp/ocrodjvu.e5v4Qh/tmp -l rus+lat /tmp/ocrodjvu.e5v4Qh/tessconf 7520 - - 15:41:18 05:57 00:06:24 107 70500 135832 tesseract /tmp/ocrodjvu.a6lLsl/000006.tif /tmp/ocrodjvu.1GJ9Kc/tmp -l rus+lat /tmp/ocrodjvu.1GJ9Kc/tessconf 7521 - - 15:41:18 05:57 00:06:24 107 73040 138592 tesseract /tmp/ocrodjvu.a6lLsl/000004.tif /tmp/ocrodjvu.VgIIXx/tmp -l rus+lat /tmp/ocrodjvu.VgIIXx/tessconf 7560 - - 15:41:30 05:45 00:06:10 107 70360 135840 tesseract /tmp/ocrodjvu.a6lLsl/000008.tif /tmp/ocrodjvu.96qHYI/tmp -l rus+lat /tmp/ocrodjvu.96qHYI/tessconf 7581 - - 15:42:32 04:43 00:05:09 109 74156 139876 tesseract /tmp/ocrodjvu.a6lLsl/000009.tif /tmp/ocrodjvu.4XLOur/tmp -l rus+lat /tmp/ocrodjvu.4XLOur/tessconf - Backtraces from ocrodjvu threads look fine: - the main thread: Waiting for the GIL File "/usr/lib/python2.7/threading.py", line 340, in wait waiter.acquire() File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 512, in _process condition.wait() File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 549, in process self._process(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 567, in main context.process(options.path, options.pages) File "/usr/local/bin/ocrodjvu", line 7, in <module> _.main(sys.argv) - internal python-djvulibre thread: #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007fdb8cf085f6 in DJVU::GMonitor::wait ***@***.***=0x1e44af0) at GThreads.cpp:576 #2 0x00007fdb8cf47ec0 in ddjvu_message_wait (ctx=0x1e44ae0) at ddjvuapi.cpp:733 #3 0x00007fdb8c3a7319 in __pyx_pf_4djvu_6decode__Context_message_distributor (__pyx_self=<optimized out>, __pyx_v_kwargs={'sentinel': <object at remote 0x7fdb8f2e4120>}, __pyx_v_self=<optimized out>) at build/temp.linux-x86_64-2.7/src/decode.c:15397 #4 __pyx_pw_4djvu_6decode_1_Context_message_distributor (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at build/temp.linux-x86_64-2.7/src/decode.c:15312 #5 0x00000000004a587e in PyObject_Call () at ../Objects/abstract.c:2546 #6 0x00000000004c5f3d in PyEval_CallObjectWithKeywords () at ../Python/ceval.c:4219 #7 0x0000000000589662 in t_bootstrap () at ../Modules/threadmodule.c:620 #8 0x00007fdb8efd46ba in start_thread (arg=0x7fdb89105700) at pthread_create.c:333 #9 0x00007fdb8ed0a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 - 8 worker threads: File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 67, in _wait_for_worker stderr = worker.stderr.readlines() File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 262, in recognize_hocr _wait_for_worker(worker) File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/engines/tesseract.py", line 296, in recognize return f(image, language, details=details, uax29=uax29) File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 407, in process_page result = self._engine.recognize(pfile, language=self._options.language, details=self._options.details, uax29=self._options.uax29) File "/usr/local/lib/python2.7/dist-packages/ocrodjvu/cli/ocrodjvu.py", line 434, in page_thread result = self.process_page(page) File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap self.__bootstrap_inner() - There are no backtraces for tesseract processes, because apparently GDB hangs on them. :-( — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#31 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAReQLoQRZAi8nTxLkKD-p-A0O8CWuIGks5vSWsxgaJpZM4agad2> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple jobs do not work with Tesseract 4 #31

Multiple jobs do not work with Tesseract 4 #31

ashipunov commented Feb 4, 2019 •

edited

Loading

jwilk commented Feb 5, 2019

This comment has been minimized.

ashipunov commented Feb 10, 2019

ashipunov commented Feb 10, 2019

jwilk commented Feb 11, 2019 •

edited

Loading

jwilk commented Feb 11, 2019

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ashipunov commented Feb 23, 2019 via email

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

jwilk commented Mar 1, 2019 •

edited

Loading

ashipunov commented Mar 1, 2019 via email

Multiple jobs do not work with Tesseract 4 #31

Multiple jobs do not work with Tesseract 4 #31

Comments

ashipunov commented Feb 4, 2019 • edited Loading

jwilk commented Feb 5, 2019

This comment has been minimized.

ashipunov commented Feb 10, 2019

ashipunov commented Feb 10, 2019

jwilk commented Feb 11, 2019 • edited Loading

jwilk commented Feb 11, 2019

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ashipunov commented Feb 23, 2019 via email

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

jwilk commented Mar 1, 2019 • edited Loading

ashipunov commented Mar 1, 2019 via email

ashipunov commented Feb 4, 2019 •

edited

Loading

jwilk commented Feb 11, 2019 •

edited

Loading

jwilk commented Mar 1, 2019 •

edited

Loading