Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: min() arg is an empty sequence #42

Closed
ninja-otaku opened this issue Sep 28, 2018 · 5 comments
Closed

ValueError: min() arg is an empty sequence #42

ninja-otaku opened this issue Sep 28, 2018 · 5 comments

Comments

@ninja-otaku
Copy link

ninja-otaku commented Sep 28, 2018

Hi, I am trying to convert a pdf to HTMl using pdftotree package. But I am getting this following error. Kindly look into it and let me know what the issue is.

File "<ipython-input-73-485b6311ae55>", line 1, in <module>
    a = pdftotree.parse('O:\\Project\\ConocoPhilips-Sustainability-Report.pdf',html_path=None, model_type=None, model_path=None, favor_figures=True, visualize=False)

  File "C:\Continuum\anaconda3\lib\site-packages\pdftotree\core.py", line 65, in parse
    pdf_tree = extractor.get_tree_structure(model_type, model, favor_figures)

  File "C:\Continuum\anaconda3\lib\site-packages\pdftotree\TreeExtract.py", line 238, in get_tree_structure
    favor_figures,

  File "C:\Continuum\anaconda3\lib\site-packages\pdftotree\utils\pdf\pdf_parsers.py", line 762, in parse_tree_structure
    mentions, elems.layout.bbox, page_num, boxes_figures, page_width, page_height

  File "C:\Continuum\anaconda3\lib\site-packages\pdftotree\utils\pdf\pdf_parsers.py", line 1246, in get_figures
    node_fig = Node(fig_box)

  File "C:\Continuum\anaconda3\lib\site-packages\pdftotree\utils\pdf\node.py", line 47, in __init__
    self.set_bbox(bound_elems(elems))

  File "C:\Continuum\anaconda3\lib\site-packages\pdftotree\utils\pdf\vector_utils.py", line 121, in bound_elems
    group_x0 = min(map(lambda l: l.x0, elems))

ValueError: min() arg is an empty sequence
@lukehsiao
Copy link
Contributor

Can you attach the document that caused the error?

@ninja-otaku
Copy link
Author

ConocoPhilips-Sustainability-Report.pdf
Hi, I am attaching the file with this comment.

@lukehsiao
Copy link
Contributor

$ pdftotree ConocoPhilips-Sustainability-Report.pdf -vv
[INFO] pdftotree.core - Digitized PDF detected, building tree structure...
[INFO] pdftotree.TreeExtract - No boxes were found on page 1.
[INFO] pdftotree.TreeExtract - No boxes were found on page 4.
[INFO] pdftotree.TreeExtract - No boxes were found on page 5.
[INFO] pdftotree.TreeExtract - No boxes were found on page 6.
[INFO] pdftotree.TreeExtract - No boxes were found on page 7.
[INFO] pdftotree.TreeExtract - No boxes were found on page 8.
[INFO] pdftotree.TreeExtract - No boxes were found on page 10.
[INFO] pdftotree.TreeExtract - No boxes were found on page 11.
[INFO] pdftotree.TreeExtract - No boxes were found on page 12.
[INFO] pdftotree.TreeExtract - No boxes were found on page 14.
[INFO] pdftotree.TreeExtract - No boxes were found on page 15.
[INFO] pdftotree.TreeExtract - No boxes were found on page 18.
[INFO] pdftotree.TreeExtract - No boxes were found on page 19.
[INFO] pdftotree.TreeExtract - No boxes were found on page 20.
[INFO] pdftotree.TreeExtract - No boxes were found on page 23.
[INFO] pdftotree.TreeExtract - No boxes were found on page 25.
[INFO] pdftotree.TreeExtract - No boxes were found on page 26.
[INFO] pdftotree.TreeExtract - No boxes were found on page 29.
[INFO] pdftotree.TreeExtract - No boxes were found on page 34.
[INFO] pdftotree.TreeExtract - No boxes were found on page 35.
[INFO] pdftotree.TreeExtract - No boxes were found on page 37.
[INFO] pdftotree.TreeExtract - No boxes were found on page 42.
[INFO] pdftotree.TreeExtract - No boxes were found on page 54.
[INFO] pdftotree.TreeExtract - No boxes were found on page 55.
[INFO] pdftotree.TreeExtract - No boxes were found on page 60.
Traceback (most recent call last):
  File ".venv/bin/pdftotree", line 116, in <module>
    args.visualize,
  File ".venv/lib/python3.6/site-packages/pdftotree/core.py", line 65, in parse
    pdf_tree = extractor.get_tree_structure(model_type, model, favor_figures)
  File ".venv/lib/python3.6/site-packages/pdftotree/TreeExtract.py", line 238, in get_tree_structure
    favor_figures,
  File ".venv/lib/python3.6/site-packages/pdftotree/utils/pdf/pdf_parsers.py", line 762, in parse_tree_structure
    mentions, elems.layout.bbox, page_num, boxes_figures, page_width, page_height
  File ".venv/lib/python3.6/site-packages/pdftotree/utils/pdf/pdf_parsers.py", line 1246, in get_figures
    node_fig = Node(fig_box)
  File ".venv/lib/python3.6/site-packages/pdftotree/utils/pdf/node.py", line 47, in __init__
    self.set_bbox(bound_elems(elems))
  File ".venv/lib/python3.6/site-packages/pdftotree/utils/pdf/vector_utils.py", line 121, in bound_elems
    group_x0 = min(map(lambda l: l.x0, elems))
ValueError: min() arg is an empty sequence

Re-opening this issue. It's good to have these open until someone is able to debug it.

@mgoo
Copy link
Contributor

mgoo commented Nov 12, 2018

Just wanted to point out that I had the same error on this file
IFT_Infratil_2017.pdf

@mgoo
Copy link
Contributor

mgoo commented Dec 22, 2018

I had a go debugging this
The problem seems to be that when the Node instance is being created for the figures a single element is passed to the contributor of the node call rather than a list of elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants