-
Notifications
You must be signed in to change notification settings - Fork 944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: cannot unpack non-iterable PDFObjRef object, when unpacking the value of 'DW2' for font CMAP #518
Comments
Hi @EucliTs0, thanks for sharing the bug. Could you copy paste the stacktrace directly from the output? The last line shows
But you mention line 708, and also Anyway, I think this can be solved by using |
@EucliTs0 this is a friendly reminder to upload extra details about this issue. |
@pietermarsman Hello, sorry for the delay I was on holidays. I paste the full traceback as you requested, below: ``Traceback (most recent call last): File "/home/dtsolakidis/workspace/OCR-1183-Pdf-to-xml-crash-when-empty-page/pdfminer_in_script.py", line 42, in File "/home/dtsolakidis/anaconda3/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 895, in process_page File "/home/dtsolakidis/anaconda3/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 906, in render_contents File "/home/dtsolakidis/anaconda3/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 354, in init_resources File "/home/dtsolakidis/anaconda3/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 202, in get_font File "/home/dtsolakidis/anaconda3/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 193, in get_font File "/home/dtsolakidis/anaconda3/lib/python3.7/site-packages/pdfminer/pdffont.py", line 709, in init TypeError: cannot unpack non-iterable PDFObjRef object`` Also, my bad regarding the line counting, 709 is the line with the latest version of pdfminer.six. I tried with resolve1() and it solved this issue. I can create a PR for this small fix |
…cking the value of 'DW2' An error is occured when the 'DW2' key contains a PDFObjRef object instead of a list of int values, e.g: 'DW2': <PDFObjRef:152>. To solve this issue, we utilise the resolve1() function See: pdfminer#518
Hello,
I have encountered an error, during PDF parse. The error happens in pdffont.py file, when the condition of the cmap.is_vertical() is True (line 708).
When it gets inside the block the following is produced:
TypeError: cannot unpack non-iterable PDFObjRef object`
I printed the whole 'spec' dictionary to see the type of 'DW2':
{'BaseFont': /'MS-Gothic', 'CIDSystemInfo': <PDFObjRef:151>, 'CIDToGIDMap': /'Identity', 'DW': 500, 'DW2': <PDFObjRef:152>, 'FontDescriptor': <PDFObjRef:153>, 'Subtype': /'CIDFontType2', 'Type': /'Font', 'W2': <PDFObjRef:154>, 'Encoding': /'Identity-V', 'ToUnicode': <PDFStream(155): len=507, {'Filter': /'FlateDecode', 'Length': <PDFObjRef:156>}>}
Normally it should be a list type I suppose, but here it is a PDFObjRef type. I have not seen any other people encountered this, could be a bug?
We can get the list value by typing:
spec['DW2'].resolve()
The code I use is just standard code to read the PDF:
Unfortunately, I cannot provide the PDF file because it is confidential document. I use the latest version of pdfminer.six.
Thank you!
The text was updated successfully, but these errors were encountered: