-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image reading breaks #424
Comments
Same problem here with a document of 185 pages @ 400 dpi. PDF export crashes somewhere between page 037 and page 043. Tried PoDoFo and QPrinter, no difference. gImageReader 3.3.1, running in Windows 10 32 bit with 8 GB RAM, output font is Times New Roman. As the page images are OK, I suspect it runs out of memory, though I had no problems with PDF export of bigger documents before on the same machine. |
Sorry, missed the original report. Do you get the crash reporter showing a stack trace when gimagereader crashed? Can you share one of the documents causing the issue? |
As the program stalls and has to be shut by force, I don't get a stack trace. I uploaded the Input files (tif), the hOCR file and the output file (PDF export) to |
Now I got a strace: #0 0x77090941 in ntdll!DbgBreakPoint () from C:\WINDOWS\SYSTEM32\ntdll.dll Thread 27 (Thread 4932.0x8fc): Thread 26 (Thread 4932.0x1a10): Thread 25 (Thread 4932.0x1a6c): Thread 24 (Thread 4932.0xc88): Thread 23 (Thread 4932.0x72c): Thread 22 (Thread 4932.0xae4): Thread 21 (Thread 4932.0x1ae4): Thread 20 (Thread 4932.0xd28): Thread 19 (Thread 4932.0xef8): Thread 18 (Thread 4932.0x1b1c): Thread 17 (Thread 4932.0x1644): Thread 16 (Thread 4932.0xf0c): Thread 15 (Thread 4932.0x1758): Thread 14 (Thread 4932.0x1358): Thread 13 (Thread 4932.0x1e44): Thread 12 (Thread 4932.0x1b40): Thread 11 (Thread 4932.0x1378): Thread 10 (Thread 4932.0x3ac): Thread 9 (Thread 4932.0x135c): Thread 8 (Thread 4932.0x1af4): Thread 7 (Thread 4932.0x12b4): Thread 6 (Thread 4932.0x794): Thread 5 (Thread 4932.0x1824): Thread 4 (Thread 4932.0x1ab8): Thread 3 (Thread 4932.0x1bd8): Thread 2 (Thread 4932.0x1678): Thread 1 (Thread 4932.0xe24): |
Two additional observations: First, the size of the page images and the hOCR file together is about 260 MB. When I have loaded them both into gImageReader to make corrections, gImageReader uses about 160 MB of RAM (acccording to TaskManager). When I try to export the document as PDF and gImageReader stalls (at page 37), TaskManager shows a RAM use of about 1200 MB. Second: Although the page image TIFs were saved (in ScanTailor Advanced) with a resolution of 400 dpi (and that is also the resolution shown when opening them in a graphics application), gImageReader shows a resolution of only 100 dpi. I can neither see a reason for this change nor could I find a way to change this in gImageReader. It makes no difference if I try to export the PDF with a resolution of 100 or 400 dpi or if I set an input resolution of 100 or 400 dpi in PDF options – gImageReader crashes anyway. I went once more through the file to make sure it contains only Times New Roman as font and there are no irregularities visible in Preview. I have uploaded the new hOCR file to the dropbox folder indicated above. |
Now I've tried to shorten the document to the first 30 pages. At first the PDF export did run through, but the resulting PDF, viewed in Adobe Acrobat Reader, showed only the OCR text level, not the page images. When I tried a second PDF export, gImageReader stalled again, this time at page 9, and again with 1300 MB RAM in use. Very strange. |
I've finally had time to look into this, indeed something with the image downscaling threading logic was broken. This should now be fixed. |
Thank you for fixing!
… Am 21.3.2020 um 14:57 schrieb Sandro Mani ***@***.***>:
I've finally had time to look into this, indeed something with the image downscaling threading logic was broken. This should now be fixed.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#424 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK3XAM6ESZTXBYMG23MOI3DRITBTTANCNFSM4I3VDCZA>.
--
Klaus Bailly
Dreeblöcken 22 · 23570 Lübeck-Travemünde
Tel. (04502) 7 70 77 23
|
@manisandro, I believe it is either not yet fixed or not fully fixed. I installed gIR v3.3.1 (GTK) from Fedora repos, loaded 846 images (mostly text, three simple tables + front and back covers as images). All images processed in ST Advanced and in 600 dpi. An image size is around 500 KiB, 5 images are between 1.2 and 2.9 MiB and the covers are 16 MiB and 36 MiB respectively. Altogether, their size is 251 MiB. gIR crashes a lot, loading hOCR HTML takes some time (I have 10 GB RAM installed in this computer, 2 cores, 4 threads). However, sometimes it works (albeit slowly). I’ve exported the PDF file, but both Evince and Adobe Acrobat Reader (both on Linux) fails to open the file and say that it is corrupted/damaged. I have no idea how to generate a PDF file with hOCR data. I’ve read about |
Without a stack trace or a reproducer, it is hard do say whether this is the same issue. The cause can also be a completely different one. |
Also note that this issue here affected the Qt version, so it's definitely not the same issue. |
Thanks, @manisandro, for your reply! Should open a new issue for that, then? |
Discovered this by processing a document of >250 pages @ 600 dpi. Exporting to PDF or ODF will trigger this. A dialogue appears after a while, telling you the image is corrupted or missing.
The dialog has an unpressable [OK] button, so you have to close it with [X], upon which the next image in line will fail to load and so on in perpetuity. The easiest way to tackle this is hold Alt+ F4.
The other way to trigger this is simply: open all the pictures, then roll through them with arrow key. Sooner or later it breaks. Once it's broken you cannot reload previous images.
Machine used was thinkpad with 4GB RAM, if it matters.
The text was updated successfully, but these errors were encountered: