extract_text(layout=True)
fails if PDF page contains no text
#658
Labels
extract_text(layout=True)
fails if PDF page contains no text
#658
Describe the bug
When extracting text from a PDF page that contains no text,
Page.extract_text
typically returns an empty string. However, if it's run with the keyword argumentlayout=True
, I get anIndexError
.Code to reproduce the problem
PDF file
This error seems to occur with any PDF page that doesn't contain any text, so any text-less PDF file will do.
Expected behavior
Page.extract_text
should return an empty string if the page contains no text, regardless of whether thelayout
keyword argument isTrue
orFalse
.Actual behavior
Without
layout=True
, you get an empty string; withlayout=True
, you get anIndexError
.Environment
Additional context
This seems to be the relevant part of the traceback:
I think adding a check of whether the
words
list inwords_to_layout
contains any elements should fix the error. Happy to do a pull request if you think the solution makes sense:The text was updated successfully, but these errors were encountered: