Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library support for training tools #3832

Open
stefan6419846 opened this issue May 30, 2022 · 2 comments
Open

Library support for training tools #3832

stefan6419846 opened this issue May 30, 2022 · 2 comments

Comments

@stefan6419846
Copy link

Environment

  • Tesseract Version: 4.1.3, but affects latest main branch as well
  • Platform: Linux localhost.localdomain 5.3.18-150300.59.68-default #1 SMP Wed May 4 11:29:09 UTC 2022 (ea30951) x86_64 x86_64 x86_64 GNU/Linux

Current Behavior

Using the training tools from src/training as a library/shared object is impossible or requires lots of manual modifications for each upstream change in this repository.

Current limitations include (but are not limited to):

  • Duplicate names (main() and parameter flags).
  • Parameters have to be passed as argc and argv in some cases where parameter handling is done without flags, while flag-based parameters are not nice to handle either.
  • In some source files exit(1) is used inside main(), instead of return 1 which generally should be equivalent.

Expected Behavior

The training tools can be used as a library/shared object with a well-defined API like it is possible with the regular Tesseract code. This allows developers to use the training tools in their own projects, for example by wrapping the native C++ code into a language-specific package (like a Python module) to avoid subprocess calls.

Suggested Fix

  • Introduce a header for the training functionality.
  • Make implementations independent from main() and allow calling them with regular parameters.
@stweil
Copy link
Member

stweil commented Jul 20, 2022

The current build already creates libtesseract_training.a. Is that library sufficient (maybe if in addition a shared library libtesseract_training.so is built)?

@stefan6419846
Copy link
Author

Depending on the use case, either a static or a shared library may make sense, so supporting both (if feasible) probably is the way to go in my opinion.

As for the original request, a suitable header file seems to be still missing. Additionally, the issues regarding duplicate names and the non-optimal behavior for providing parameters appear to still be unresolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants