Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does not compile against libtesseract anymore #259

Open
bertsky opened this issue Jul 2, 2021 · 7 comments
Open

does not compile against libtesseract anymore #259

bertsky opened this issue Jul 2, 2021 · 7 comments

Comments

@bertsky
Copy link
Contributor

bertsky commented Jul 2, 2021

With the current master, I cannot pip install anymore:

  Building wheel for tesserocr (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /data/venv/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-yyb85qtw/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-yyb85qtw/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-e_hk2brj
       cwd: /tmp/pip-req-build-yyb85qtw/
  Complete output (418 lines):
  Supporting tesseract v5.0.0-alpha-622-g7d94
  Tesseract major version 5
  Configs from pkg-config: {'library_dirs': ['/usr/local/lib', '/usr/local/lib'], 'include_dirs': ['/usr/local/include', '/usr/local/include', '/usr/local/include'], 'libraries': ['tesseract', 'archive', 'curl', 'lept'], 'compile_time_env': {'TESSERACT_MAJOR_VERSION': 5, 'TESSERACT_VERSION': 1234798114}}
  running bdist_wheel
  running build
  running build_ext
  Detected compiler: unix
  building 'tesserocr' extension
  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/include -I/usr/local/include -I/usr/local/include -I/data/venv/include -I/usr/include/python3.6m -c tesserocr.cpp -o build/temp.linux-x86_64-3.6/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
  tesserocr.cpp:1905:91: error: ‘PolyBlockType’ does not name an enumeration in ‘tesseract’
   static CYTHON_INLINE PyObject* __Pyx_PyInt_From_enum__tesseract_3a__3a_PolyBlockType(enum tesseract::PolyBlockType value);
                                                                                             ^~~~~~~~~
  tesserocr.cpp:1905:102: error: ‘PolyBlockType’ in namespace ‘tesseract’ does not name a type
   static CYTHON_INLINE PyObject* __Pyx_PyInt_From_enum__tesseract_3a__3a_PolyBlockType(enum tesseract::PolyBlockType value);
                                                                                                        ^~~~~~~~~~~~~
  tesserocr.cpp:1920:99: error: ‘StrongScriptDirection’ does not name an enumeration in ‘tesseract’
   static CYTHON_INLINE PyObject* __Pyx_PyInt_From_enum__tesseract_3a__3a_StrongScriptDirection(enum tesseract::StrongScriptDirection value);
                                                                                                     ^~~~~~~~~
  tesserocr.cpp:1920:110: error: ‘StrongScriptDirection’ in namespace ‘tesseract’ does not name a type
   static CYTHON_INLINE PyObject* __Pyx_PyInt_From_enum__tesseract_3a__3a_StrongScriptDirection(enum tesseract::StrongScriptDirection value);
                                                                                                                ^~~~~~~~~~~~~~~~~~~~~
  tesserocr.cpp: In function ‘int __pyx_f_9tesserocr_13PyTessBaseAPI__init_api(__pyx_obj_9tesserocr_PyTessBaseAPI*, __pyx_t_10tesseract5_cchar_t*, __pyx_t_10tesseract5_cchar_t*, tesseract::OcrEngineMode, char**, int, const std::vector<std::__cxx11::basic_string<char> >*, const std::vector<std::__cxx11::basic_string<char> >*, bool, tesseract::PageSegMode)’:
  tesserocr.cpp:14592:197: error: no matching function for call to ‘tesseract::TessBaseAPI::Init(__pyx_t_10tesseract5_cchar_t*&, __pyx_t_10tesseract5_cchar_t*&, tesseract::OcrEngineMode&, char**&, int&, const std::vector<std::__cxx11::basic_string<char> >*&, const std::vector<std::__cxx11::basic_string<char> >*&, bool&)’
       __pyx_v_ret = __pyx_v_self->_baseapi.Init(__pyx_v_path, __pyx_v_lang, __pyx_v_oem, __pyx_v_configs, __pyx_v_configs_size, __pyx_v_vars_vec, __pyx_v_vars_vals, __pyx_v_set_only_non_debug_params);
                                                                                                                                                                                                       ^
  In file included from tesserocr.cpp:694:0:
  /usr/local/include/tesseract/baseapi.h:219:7: note: candidate: int tesseract::TessBaseAPI::Init(const char*, const char*, tesseract::OcrEngineMode, char**, int, const GenericVector<STRING>*, const GenericVector<STRING>*, bool)
     int Init(const char* datapath, const char* language, OcrEngineMode mode,
         ^~~~
  /usr/local/include/tesseract/baseapi.h:219:7: note:   no known conversion for argument 6 from ‘const std::vector<std::__cxx11::basic_string<char> >*’ to ‘const GenericVector<STRING>*’
  /usr/local/include/tesseract/baseapi.h:224:7: note: candidate: int tesseract::TessBaseAPI::Init(const char*, const char*, tesseract::OcrEngineMode)
     int Init(const char* datapath, const char* language, OcrEngineMode oem) {
         ^~~~
  /usr/local/include/tesseract/baseapi.h:224:7: note:   candidate expects 3 arguments, 8 provided
  /usr/local/include/tesseract/baseapi.h:227:7: note: candidate: int tesseract::TessBaseAPI::Init(const char*, const char*)
     int Init(const char* datapath, const char* language) {
         ^~~~
  /usr/local/include/tesseract/baseapi.h:227:7: note:   candidate expects 2 arguments, 8 provided
  /usr/local/include/tesseract/baseapi.h:233:7: note: candidate: int tesseract::TessBaseAPI::Init(const char*, int, const char*, tesseract::OcrEngineMode, char**, int, const GenericVector<STRING>*, const GenericVector<STRING>*, bool, tesseract::FileReader)
     int Init(const char* data, int data_size, const char* language,
         ^~~~
...

Is my Tesseract too old (i.e. have there been breaking API changes recently in Tesseract 5) perhaps?

@bertsky
Copy link
Contributor Author

bertsky commented Jul 2, 2021

The above was Python 3.6 / Tesseract v5.0.0-alpha-622-g7d94 / gcc 7.5.0. I get the same on Python 3.7 / Tesseract v5.0.0-alpha-626-gddb6 / gcc 8.3.0. Cython is the newest 0.29.23.

@bertsky
Copy link
Contributor Author

bertsky commented Jul 2, 2021

Bisection revealed this happened at 8a98bf4. The error also goes away with most recent git version of Tesseract, v5.0.0-alpha-20210401.

That's a regression: tesserocr used to be backwards compatible and flexible. @stweil?

@stweil
Copy link
Contributor

stweil commented Jul 3, 2021

Backwards compatible here means that it must work with the official releases (4.1.1). And it must work with the latest releases of Tesseract 5.0.

@bertsky
Copy link
Contributor Author

bertsky commented Jul 3, 2021

Backwards compatible here means that it must work with the official releases (4.1.1). And it must work with the latest releases of Tesseract 5.0.

No, it used to be that tesserocr is compatible with a wide range of Tesseract versions, if necessary differentiating them with ifdefs to encapsulate differences to the Python user. But 8a98bf4 introduced a blanket condition TESSERACT_MAJOR_VERSION >= 5 which apparently conflates some API changes, and it brought the unfortunate situation that there are now two source files to keep synchronized, tesseract.pxd and tesseract5.pxd.

@stweil
Copy link
Contributor

stweil commented Jul 3, 2021

I don't think that it is necessary that Tesserocr supports old or intermediate revisions of Tesseract which are completely unsupported (and buggy).

@sirfz
Copy link
Owner

sirfz commented Jul 3, 2021

tesseract 5 is still in development so tesserocr cannot guarantee compatibility since it can break at any moment, all stable releases >=3.04 are supported and so will version 5 once it's released.

@bertsky
Copy link
Contributor Author

bertsky commented Sep 13, 2021

Tesseract master seems to have been supported by tesserocr for a long time, though, despite the extra effort. Especially during the long time after LSTMs had been (hastily) integrated. And at least trying to support the alpha is not just a matter of convenience: many projects depend on the Python bindings to test and advance new features. Why is this being turned down so lightly? (It should be easy for those who made the respective changes in Tesseract recently to differentiate APIs by exact version.)

Also, I still see this as the most pressing problem here:

and it brought the unfortunate situation that there are now two source files to keep synchronized, tesseract.pxd and tesseract5.pxd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants