You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, today I noticed a sudden change in the way text is extracted from PDFs. It seems like a lot of the binary content is being included. This is causing our tests to fail:
We've been able to resolve this quickly on our end by downgrading the package version; but just wanted to give you guys a heads-up.
EDIT: On further investigation, it looks like a change in the python API caused the issue:
Traceback (most recent call last):
File "/home/bls/Downloads/code/bbot/bbot/modules/extractous.py", line 135, in extract_text
buffer = reader.read(4096)
^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'read'
The text was updated successfully, but these errors were encountered:
Thanks for @TheTechromancer reporting this. In version 0.2.0, we changed the API to return a tuple of reader and metadata. add this to your extract call: reader, metada = extractor.extract_ ...
Please look at the updated Docs
Thanks yeah we were able to fix it. Is there a chance there will be another breaking API change without a major version increase? If so, going forward we can pin the version on our side.
Hi, today I noticed a sudden change in the way text is extracted from PDFs. It seems like a lot of the binary content is being included. This is causing our tests to fail:
We've been able to resolve this quickly on our end by downgrading the package version; but just wanted to give you guys a heads-up.
EDIT: On further investigation, it looks like a change in the python API caused the issue:
The text was updated successfully, but these errors were encountered: