Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError initializing pages #41

Closed
dankeemahill opened this issue Nov 21, 2017 · 2 comments
Closed

ValueError initializing pages #41

dankeemahill opened this issue Nov 21, 2017 · 2 comments

Comments

@dankeemahill
Copy link

dankeemahill commented Nov 21, 2017

I get a ValueError: Cannot convert <PDFObjRef:4> to Decimal. when accessing the pages of this pdf with pdfplumber==0.5.5.

I notice the pdfplumber.open call seems to run much quicker on this file compared to other files that don't raise this error.

pdf = pdfplumber.open('Hays TX 11-8-2016+hays+county+total+canvass.pdf')
pdf.pages

⬇️

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-bc19284d68b8> in <module>()
      1 pdf = pdfplumber.open('Hays TX 11-8-2016+hays+county+total+canvass.pdf')
----> 2 pdf.pages

/anaconda3/lib/python3.6/site-packages/pdfplumber/pdf.py in pages(self)
     54             page_number = i+1
     55             if pp != None and page_number not in pp: continue
---> 56             p = Page(self, page, page_number=page_number, initial_doctop=doctop)
     57             self._pages.append(p)
     58             doctop += p.height

/anaconda3/lib/python3.6/site-packages/pdfplumber/page.py in __init__(self, pdf, page_obj, page_number, initial_doctop)
     21 
     22         cropbox = page_obj.attrs.get("CropBox", page_obj.attrs.get("MediaBox"))
---> 23         self.cropbox = self.decimalize(cropbox)
     24 
     25         if self.rotation in [ 90, 270 ]:

/anaconda3/lib/python3.6/site-packages/pdfplumber/page.py in decimalize(self, x)
     39 
     40     def decimalize(self, x):
---> 41         return utils.decimalize(x, self.pdf.precision)
     42 
     43     @property

/anaconda3/lib/python3.6/site-packages/pdfplumber/utils.py in decimalize(v, q)
     87             return Decimal(repr(v))
     88     else:
---> 89         raise ValueError("Cannot convert {0} to Decimal.".format(v))
     90 
     91 def is_dataframe(collection):

ValueError: Cannot convert <PDFObjRef:4> to Decimal.
jsvine added a commit that referenced this issue Nov 22, 2017
... in which PDF-object-referenced cropboxes/mediaboxes weren't being
fully resolved. Thanks to @dankeemahill for flagging!
jsvine added a commit that referenced this issue Nov 22, 2017
Fix issue #41 and bump to v0.5.6
@jsvine
Copy link
Owner

jsvine commented Nov 22, 2017

Big thanks for flagging, Dan! Just pushed a fix, which should now be available in v0.5.6. The PDF you've referenced now seems to load fine for me. Let me know if you find otherwise.

@dankeemahill
Copy link
Author

I also was able to init pages on the referenced PDF with v0.5.6. Thanks! ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants