Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault shapeclustering -F font_properties -U unicharset -O chi.unicharset #3939

Closed
jihh opened this issue Oct 6, 2022 · 2 comments

Comments

@jihh
Copy link

jihh commented Oct 6, 2022

Environment

  • Tesseract Version: 5.2.0-45-gb5ee
    leptonica-1.82.0
    libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.1 : libopenjp2 2.4.0
    Found AVX2
    Found AVX
    Found FMA
    Found SSE4.1
    Found libarchive 3.5.2 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.6 liblz4/1.9.3 libzstd/1.5.0
    Found libcurl/7.64.1 SecureTransport (LibreSSL/2.8.3) zlib/1.2.11 nghttp2/1.41.0
  • Commit Number: 97a1a3e
  • Platform: Darwin <hostname_hide> 20.6.0 Darwin Kernel Version 20.6.0: Tue Apr 19 21:04:45 PDT 2022; root:xnu-7195.141.29~1/RELEASE_X86_64 x86_64

Current Behavior:

% tesseract chi.pgssub.ocr.tif chi.pgssub.ocr -l chi_sim -psm 7 batch.nochop makebox
...
% <adjust box file by jTessBoxEditor >
...
% echo pgssub 0 0 0 0 0 >font_properties
...
% tesseract chi.pgssub.ocr.tif chi.pgssub.ocr -l chi_sim -psm 7 nobatch box.train
...
% unicharset_extractor chi.pgssub.ocr.box
...
% % shapeclustering -F font_properties -U unicharset -O chi.unicharset chi.pgssub.ocr.tr        
Reading chi.pgssub.ocr.tr ...
zsh: segmentation fault  shapeclustering -F font_properties -U unicharset -O chi.unicharset 

% shapeclustering` -F font_properties -U unicharset chi.pgssub.ocr.tr
Reading chi.pgssub.ocr.tr ...
zsh: segmentation fault  shapeclustering -F font_properties -U unicharset chi.pgssub.ocr.tr

Expected Behavior:

Suggested Fix:

Attachments

  • chi.pgssub.ocr.tif
  • chi.pgssub.ocr.box
  • font_properties

file.zip

@amitdo
Copy link
Collaborator

amitdo commented Oct 6, 2022

shapeclustering

shapeclustering should not normally be used except for the Indic languages.

It is not needed for Chinese.

@amitdo
Copy link
Collaborator

amitdo commented Oct 6, 2022

See #3925.

@amitdo amitdo closed this as completed Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants