-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Python 3.9+ #304
Comments
Alright, thanks for the note! I just uploaded calamari version 2.2.0 to PyPI. Let's hope we don't get too many problems with models being unable to update between tf 2.4 and 2.5... |
I have not had any issues with my Calamari 1.0 models in respect to TF 2.4 or TF 2.5, but we're going to have to test anyway. Learning from my experience with combinations of PyPI TensorFlow versions and Python versions, I'll also update the ocrd_calamari tests to test on all Python versions 3.7+ (3.6 is EOL and AFAICT also not compatible with tfaip) |
BTW, should there be problems with deserializing HDF5 models in other Python / TF releases, consider switching to SavedModel format. Conversion is as simple as an interactive load+save session (with a path name without the |
I think this release is missing here on GitHub! |
Can't install Calamari 2.2.2 on Python 3.10: It depends on |
So since with newer Python versions (>=3.8) the old HDF5 models don't load anymore ( Luckily, @andbue already solved this in #321 in the usual on-load on-demand converter – fantastic, thanks! |
Calamari 1.0.x (!) branch works with 3.7-3.11, only a small issue with old 1.0.x models on Python 3.11, model upgrade procedure here: OCR-D/ocrd_calamari#91 (the |
I'll get
|
PS: using Python 3.8.10 from Ubuntu 20.04 |
Just tried with Ubuntu18.04 (in schroot) & python3.7:
|
With ubuntu18.04 & python3.6 I'll get calamari-predict v1.0.6 - how to install 2.x? |
ubuntu 20.04 & |
Have a look at #304 (comment), #356 and the demo notebook at https://github.com/andbue/calamari_demo. I think that loading the models with the python version they've been created with (3.7 in most cases) and a calamari containing the commits in #321 will convert them to the SavedModel format. The converted models should work in python 3.8. |
So this os not available with |
Unfortunately only in the current master branch. |
ah the secret ingredient... Are the models very position dependent? I'll try using Abbyy segmented image snippets with calamari2 and the output is much worse than using OCR-D workflow with calamari(1) |
Dependent on the position of the text in the line snippet? There is a certain dependency concerning the height of the lines. I often have errors in the first line of the page if the line segment there contains a lot of empty space above the text. Could be the same problem here if abbyy creates overlapping line segments. Also, the old models are trained on binarized input. Binarizing the images (ocropus-nlbin) does improve the results with these models. The OCR-D workflow might run the binarization automatically. Newer models like deep3_lsh4 in calamari_models_experimental are trained on grayscale (am I right, @chreul?). |
yes, new models like lsh4 were trained on various preprocessing outputs including different binarizations and also the normalized grayscale output of ocropus |
ocropus-nlbin helped a lot. Thanks. |
@chreul but Ocropus' nrm is not really grayscale as you know, and AFAICS all the models use the Ocropus CenterNormalizer (line dewarper) as preprocessor, so actual grayscale images would quickly degrade the training as the dewarper would hallucinate weird center lines and therefore distort the input. As a user, I don't even know how to deactivate the dewarper during training or prediction, yet. But my question to you is: why has true grayscale not been done anywhere, not even in your Gothic handwriting models? Anyway, the issue was about Py39 support and other dependencies, as well as HDF5 problems, which are solved as of Calamari 2.3 |
The current requirements.txt wants TensorFlow 2.4.x - which is not available on PyPI for Python 3.9+.
(Side note: We have been using TensorFlow >= 2.5.0 with Calmari 1.0.x for this reason.)
The text was updated successfully, but these errors were encountered: