Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there more PSM modes than are listed in the help/wiki - 11 and 12? #434

Closed
samiles opened this issue Sep 27, 2016 · 4 comments
Closed

Comments

@samiles
Copy link

samiles commented Sep 27, 2016

Hi,

The command line help output shows 11 PSM modes (0 through 10).

  pagesegmode values are:
  0 = Orientation and script detection (OSD) only.
  1 = Automatic page segmentation with OSD.
  2 = Automatic page segmentation, but no OSD, or OCR
  3 = Fully automatic page segmentation, but no OSD. (Default)
  4 = Assume a single column of text of variable sizes.
  5 = Assume a single uniform block of vertically aligned text.
  6 = Assume a single uniform block of text.
  7 = Treat the image as a single text line.
  8 = Treat the image as a single word.
  9 = Treat the image as a single word in a circle.
  10 = Treat the image as a single character.

I was trying each one and getting mixed results. However, I accidentally ran 'psm -11' and I suddenly got perfect accuracy - way way better than any other PSM mode, and much better than the default. The same for PSM 12 too, perfect accuracy - then PSM 13 gives nothing.

The image is just about 10 words over 2 lines, spread about the page. All the other segmentation modes and default garble the text, but PSM 11/12 worked great, splitting text perfectly.

Is it correct that there's a PSM 11 and 12 mode? What do they do, why do they give such good accuracy?! And should they be in the help/Wiki?

Thanks!

@samiles samiles changed the title Are there more PSM modes than are listen in the help/wiki? Are there more PSM modes than are listed in the help/wiki - 11 and 12? Sep 27, 2016
@amitdo
Copy link
Collaborator

amitdo commented Sep 27, 2016

Is it correct that there's a PSM 11 and 12 mode?

Yes. and PSM 13 too.
You revealed our secret! :)
https://github.com/tesseract-ocr/tesseract/blob/8d6dbb133b41/api/tesseractmain.cpp#L115

I'm sorry, I do not have good answers to your other questions.

@amitdo
Copy link
Collaborator

amitdo commented Nov 22, 2016

With Tesseract 4.0 PSM 11, 12, and 13 appear in the help message. psm 13 is used with the new LSTM engine to OCR a single textline image.

@zdenop
Copy link
Contributor

zdenop commented Dec 7, 2016

master(4.00) & 3.05 repository produce help message for psm 11-13

@zdenop zdenop closed this as completed Dec 7, 2016
@crifan
Copy link

crifan commented Dec 3, 2019

for

●✚  tesseract --version 
tesseract 4.1.0
 leptonica-1.78.0
  libgif 5.2.1 : libjpeg 9c : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found SSE

support psm=11 -13:

 tesseract --help-psm
Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants