Skip to content

Commit

Permalink
added README note on poppler install and better error handling for po…
Browse files Browse the repository at this point in the history
…ppler not found
  • Loading branch information
grantbuster committed Nov 20, 2023
1 parent 4e4ac93 commit 41589ca
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 4 deletions.
17 changes: 13 additions & 4 deletions elm/pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -254,10 +254,19 @@ def clean_poppler(self, layout=True):
if not os.path.exists(os.path.dirname(fp_out)):
os.makedirs(os.path.dirname(fp_out), exist_ok=True)

stdout = subprocess.run(args, check=True, stdout=subprocess.PIPE)
if stdout.returncode != 0:
msg = ('Poppler raised return code {}: {}'
.format(stdout.returncode, stdout))
try:
stdout = subprocess.run(args, check=True,
stdout=subprocess.PIPE)
if stdout.returncode != 0:
msg = ('Poppler raised return code {}: {}'
.format(stdout.returncode, stdout))
logger.exception(msg)
raise RuntimeError(msg)
except Exception as e:
msg = ('PDF cleaning with poppler failed! This usually '
'because you have not installed the poppler utility '
'(see https://poppler.freedesktop.org/). '
f'Full error: {e}')
logger.exception(msg)
raise RuntimeError(msg)

Expand Down
2 changes: 2 additions & 0 deletions examples/energy_wizard/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ corpus.

Notes:

- In this example, we use the optional `popper <https://poppler.freedesktop.org/>`_ PDF utility which you will have to install separately. You can also use the python-native ``PyPDF2`` package when calling using ``elm.pdf.PDFtoTXT`` but we have found that poppler works better.

- Streamlit is required to run this app, which is not an explicit requirement of this repo (``pip install streamlit``)

- You need to set up your own OpenAI or Azure-OpenAI API keys to run the scripts.
Expand Down

0 comments on commit 41589ca

Please sign in to comment.