-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pdplumber return empty string on importerror #481
base: master
Are you sure you want to change the base?
Conversation
024d674
to
ed9499d
Compare
src/invoice2data/input/pdfplumber.py
Outdated
@@ -19,7 +19,8 @@ def to_text(path): | |||
try: | |||
import pdfplumber | |||
except ImportError: | |||
logger.debug("Cannot import pdfplumber") | |||
logger.error("Cannot import pdfplumber") | |||
return "".encode("UTF-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning empty string suggests that invoice was parsed but was empty.
If we want to return some value then make it None
please.
If you take a look at pdftotext.py
however, you'll see it raises EnvironmentError
if pdftotext
is missing. So returning None
will make pdfplumber.py
somehow incompatible with the pdftotext.py
.
As for decision what is better: return None
or raise EnvironmentError
- I have no idea or preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmilecki I agree, we need to think of an solution for this.
Returning None conflicts with:
invoice2data/src/invoice2data/main.py
Line 89 in a5bdd50
extracted_str = input_module.to_text(invoicefile).decode("utf-8") |
as Nonetype cannot be decoded
(Maybe we can remove the decode line? I assume it is a python2 leftover)
pdftotext might be a different story, as it is one of the default/main parsers.
So making everything fail when it's unavailable is not a big deal.
Is there a way to raise the error, only if the pdfplumber input module is called?
We don't want the whole lib to fail on this missing requirement.
This pr #491
is currently failing because of the missing pdfplumber.
(the test should not even run when it is unavailable but that's a different sunbject)
In that example, there should be an ImportError
or EnvironmentError
ed9499d
to
f7d42b7
Compare
f7d42b7
to
1ae80ab
Compare
1ae80ab
to
db0ac11
Compare
Was running some tests, encountered following error when pdfplumber is not available.
This PR returns and empty value and let invoice2data fail gracefully.
Before:
invoice2data input.pdf --input-reader=pdfplumber
After:
fixes #362