TypeError: 'PDFObjRef' object is not iterable #316

Abdur-rahmaanJ · 2020-11-23T06:38:50Z

Describe the bug

Got

Traceback (most recent call last):
  File "main.py", line 5, in <module>
    with pdfplumber.open("<stripped_path>") as pdf:
  File "<stripped_path>\venv\lib\site-packages\pdfplumber\pdf.py", line 46, in open
    return cls(open(path_or_fp, "rb"), **kwargs)
  File "<stripped_path>\venv\lib\site-packages\pdfplumber\pdf.py", line 33, in __init__
    self.metadata[k] = list(map(decode_text, v))
  File "<stripped_path>\pdfres\venv\lib\site-packages\pdfplumber\utils.py", line 77, in decode_text
    ords = (ord(c) if type(c) == str else c for c in s)
TypeError: 'PDFObjRef' object is not iterable

Code to reproduce the problem

import pdfplumber

with pdfplumber.open("target.pdf") as pdf:
    first_page = pdf.pages[0]
    print(first_page.chars[0])

Environment

pdfplumber version: 0.5.24
Python version: 3.8.1
OS: Windows

samkit-jain · 2020-11-23T10:38:49Z

Hi @Abdur-rahmaanJ Appreciate your interest in the library. Would it be possible for you to share a PDF to demonstrate this issue? Will help us in reproducing and fixing the issue. Please remove any sensitive information from the PDF before sharing it here.

Abdur-rahmaanJ · 2020-11-23T11:04:11Z

Try it on any research gate pdf. If you dont get the error on windows, i'll send you the exact pdf

samkit-jain · 2020-11-23T13:00:24Z

I chose this PDF and it ran fine for me. I got

{'fontname': 'SourceSansPro-Regular', 'adv': Decimal('4.803'), 'upright': True, 'x0': Decimal('39.870'), 'y0': Decimal('711.959'), 'x1': Decimal('43.061'), 'y1': Decimal('717.939'), 'width': Decimal('3.192'), 'height': Decimal('5.980'), 'size': Decimal('5.980'), 'object_type': 'char', 'page_number': 1, 'stroking_color': (0, 0, 0), 'non_stroking_color': (0, 0, 0), 'text': 'S', 'top': Decimal('74.061'), 'bottom': Decimal('80.041'), 'doctop': Decimal('74.061')}

as the output

The thing to note is that I am using Ubuntu and not Windows. If for the PDF I used, you are seeing the same error, then, it might be OS specific. If not, then it might be PDF specific and would request you to share the PDF you used.

Abdur-rahmaanJ · 2020-11-23T19:23:59Z

Try checking this one

samkit-jain · 2020-11-24T10:48:44Z

Thank you for sharing the PDF @Abdur-rahmaanJ The issue is coming because the PDF has a metadata field by the name Changes which is a list of PDFObjRef objects. I am not sure if that is allowed by the PDF specifications (linking #297 (comment)) but nonetheless, it is something that can be handled in the code. I shall raise a PR for it soon.

Abdur-rahmaanJ · 2020-11-24T10:57:44Z

XD since this was the first PDF checked, i assumed pdfplumber was broken!

* Treat invalid/unparseable metadata values as warnings — Certain invalid values if parseable don't throw a warning and only unparseable (always invalid) throw * Recursively parse metadata values to handle nested `PDFObjRef` objects — Fixes #316 * Resolve lint issues and remove unused imports * Make metadata parse failure handling behaviour configurable * Update tests to bump up test coverage * Update changelog Co-authored-by: Matt Clark <44023+mclark@users.noreply.github.com>

@samkit-jain

Code and commits by @samkit-jain: * Treat invalid/unparseable metadata values as warnings — Certain invalid values if parseable don't throw a warning and only unparseable (always invalid) throw * Recursively parse metadata values to handle nested `PDFObjRef` objects — Fixes #316 * Resolve lint issues and remove unused imports * Make metadata parse failure handling behaviour configurable * Update tests to bump up test coverage * Update changelog Co-authored-by: Samkit Jain <15127115+samkit-jain@users.noreply.github.com>

Abdur-rahmaanJ added the bug label Nov 23, 2020

samkit-jain added the awaiting-code-or-pdf Issues and PRs awaiting code and/or a PDF from issue/PR-author label Nov 23, 2020

samkit-jain removed the awaiting-code-or-pdf Issues and PRs awaiting code and/or a PDF from issue/PR-author label Nov 24, 2020

samkit-jain mentioned this issue Nov 29, 2020

Handle invalid metadata values #320

Merged

samkit-jain self-assigned this Nov 30, 2020

jsvine closed this as completed in 2d9415c Dec 9, 2020

cmdlineluser mentioned this issue Jul 14, 2023

TypeError: argument of type 'PDFObjRef' is not iterable #935

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: 'PDFObjRef' object is not iterable #316

TypeError: 'PDFObjRef' object is not iterable #316

Abdur-rahmaanJ commented Nov 23, 2020

samkit-jain commented Nov 23, 2020

Abdur-rahmaanJ commented Nov 23, 2020

samkit-jain commented Nov 23, 2020

Abdur-rahmaanJ commented Nov 23, 2020

samkit-jain commented Nov 24, 2020 •

edited

Loading

Abdur-rahmaanJ commented Nov 24, 2020

TypeError: 'PDFObjRef' object is not iterable #316

TypeError: 'PDFObjRef' object is not iterable #316

Comments

Abdur-rahmaanJ commented Nov 23, 2020

Describe the bug

Code to reproduce the problem

Environment

samkit-jain commented Nov 23, 2020

Abdur-rahmaanJ commented Nov 23, 2020

samkit-jain commented Nov 23, 2020

Abdur-rahmaanJ commented Nov 23, 2020

samkit-jain commented Nov 24, 2020 • edited Loading

Abdur-rahmaanJ commented Nov 24, 2020

samkit-jain commented Nov 24, 2020 •

edited

Loading