Suggestions to optimise and improve gImageReader #480

tukusejssirs · 2021-01-04T13:06:18Z

First of all, I’d like to tell you that I love your program! Hope you’ll have time to maintain it for like forever! 😃

I have the impression that gImageReader is not optimised for many files, however, I think that easing up the processing of many files in batch is the main purpose of the program. I have not reported this yet, but I encounter quite a lot crashes of gImageReader (sometimes a window with tesseract crash log shows up).

In my opinion, the program:

should be optimised;
should allow batch recognising all the images;
should have the ability to save a project (like ScanTailor does) to save the progress to file (with command-line option to open that file);
should save UI preferences (like keep the output pane open until I close, keep the Source hOCR tab open instead of Properties until I change the open tab, …);
should smooth the image scrolling (it is pain to scroll the image and even greater pain to keep it zoomed in);
should have a GUI option to cancel/stop the current job (like recognising the current and the following images, while pre already recognised text should be kept);
add command-line options:
- to select which images are to be added into the program (including globs);
- open a gIR project (when implemented);
- open an hOCR file;
- start recognising right away (after the program opens);
- don’t open gIR GUI:
  - this option could be used when one wants to recognise the text and save the recognised text to either hOCR or text file (or both);
  - it could output some info messages (like processing file # from ###), which should be suppressed when --quiet option is also supplied;
- save to PDF with hOCR;
- save to hOCR;
- save to text file;
- --quiet, -q option to suppress STDOUT/STDERR output (IMHO gIR should be verbose by default).

The text was updated successfully, but these errors were encountered:

manisandro · 2021-02-08T19:40:18Z

Contributions are welcome!

hollisticated-horse · 2021-03-23T12:27:17Z

is the command line option getting worked on ?
If not I'd be interested to help, though I have very little skill, but a lot of time and motivation ?

manisandro · 2022-01-15T20:05:48Z

should be optimised;

Vague, can mean anything

should allow batch recognising all the images;

It does

should have the ability to save a project (like ScanTailor does) to save the progress to file (with command-line option to open that file);

Not feasible without tesseract support

should save UI preferences (like keep the output pane open until I close, keep the Source hOCR tab open instead of Properties until I change the open tab, …);

Mostly already the case

should smooth the image scrolling (it is pain to scroll the image and even greater pain to keep it zoomed in);

Standard toolkit behaviour

should have a GUI option to cancel/stop the current job (like recognising the current and the following images, while pre already recognised text should be kept);

There is a cancel button

add command-line options:
to select which images are to be added into the program (including globs);

You can specify which files to open from the command line

   open a gIR project (when implemented);

The hOCR file is basically the project file, there is not much else to store

    open an hOCR file;

Already possible

   start recognising right away (after the program opens);
   don’t open gIR GUI:
       this option could be used when one wants to recognise the text and save the recognised text to either hOCR or text file (or both);
       it could output some info messages (like processing file # from ###), which should be suppressed when --quiet option >is also supplied;

That's what tesseract command-line is for

    save to PDF with hOCR;

Already possible

    save to hOCR;

Already possible

    save to text file;

Already possible

   --quiet, -q option to suppress STDOUT/STDERR output (IMHO gIR should be verbose by default).

gImageReader doesn't really output anything itself

tukusejssirs mentioned this issue Jan 12, 2021

hocr import / export ocrmypdf/OCRmyPDF#453

Closed

manisandro closed this as completed Jan 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions to optimise and improve gImageReader #480

Suggestions to optimise and improve gImageReader #480

tukusejssirs commented Jan 4, 2021 •

edited

Loading

manisandro commented Feb 8, 2021

hollisticated-horse commented Mar 23, 2021

manisandro commented Jan 15, 2022

Suggestions to optimise and improve gImageReader #480

Suggestions to optimise and improve gImageReader #480

Comments

tukusejssirs commented Jan 4, 2021 • edited Loading

manisandro commented Feb 8, 2021

hollisticated-horse commented Mar 23, 2021

manisandro commented Jan 15, 2022

tukusejssirs commented Jan 4, 2021 •

edited

Loading