Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: 'PDFObjRef' object is not iterable #316

Closed
Abdur-rahmaanJ opened this issue Nov 23, 2020 · 6 comments
Closed

TypeError: 'PDFObjRef' object is not iterable #316

Abdur-rahmaanJ opened this issue Nov 23, 2020 · 6 comments
Assignees
Labels

Comments

@Abdur-rahmaanJ
Copy link

Describe the bug

Got

Traceback (most recent call last):
  File "main.py", line 5, in <module>
    with pdfplumber.open("<stripped_path>") as pdf:
  File "<stripped_path>\venv\lib\site-packages\pdfplumber\pdf.py", line 46, in open
    return cls(open(path_or_fp, "rb"), **kwargs)
  File "<stripped_path>\venv\lib\site-packages\pdfplumber\pdf.py", line 33, in __init__
    self.metadata[k] = list(map(decode_text, v))
  File "<stripped_path>\pdfres\venv\lib\site-packages\pdfplumber\utils.py", line 77, in decode_text
    ords = (ord(c) if type(c) == str else c for c in s)
TypeError: 'PDFObjRef' object is not iterable

Code to reproduce the problem

import pdfplumber

with pdfplumber.open("target.pdf") as pdf:
    first_page = pdf.pages[0]
    print(first_page.chars[0])

Environment

  • pdfplumber version: 0.5.24
  • Python version: 3.8.1
  • OS: Windows
@samkit-jain
Copy link
Collaborator

Hi @Abdur-rahmaanJ Appreciate your interest in the library. Would it be possible for you to share a PDF to demonstrate this issue? Will help us in reproducing and fixing the issue. Please remove any sensitive information from the PDF before sharing it here.

@samkit-jain samkit-jain added the awaiting-code-or-pdf Issues and PRs awaiting code and/or a PDF from issue/PR-author label Nov 23, 2020
@Abdur-rahmaanJ
Copy link
Author

Try it on any research gate pdf. If you dont get the error on windows, i'll send you the exact pdf

@samkit-jain
Copy link
Collaborator

I chose this PDF and it ran fine for me. I got

{'fontname': 'SourceSansPro-Regular', 'adv': Decimal('4.803'), 'upright': True, 'x0': Decimal('39.870'), 'y0': Decimal('711.959'), 'x1': Decimal('43.061'), 'y1': Decimal('717.939'), 'width': Decimal('3.192'), 'height': Decimal('5.980'), 'size': Decimal('5.980'), 'object_type': 'char', 'page_number': 1, 'stroking_color': (0, 0, 0), 'non_stroking_color': (0, 0, 0), 'text': 'S', 'top': Decimal('74.061'), 'bottom': Decimal('80.041'), 'doctop': Decimal('74.061')}

as the output

The thing to note is that I am using Ubuntu and not Windows. If for the PDF I used, you are seeing the same error, then, it might be OS specific. If not, then it might be PDF specific and would request you to share the PDF you used.

@Abdur-rahmaanJ
Copy link
Author

Try checking this one

@samkit-jain samkit-jain removed the awaiting-code-or-pdf Issues and PRs awaiting code and/or a PDF from issue/PR-author label Nov 24, 2020
@samkit-jain
Copy link
Collaborator

samkit-jain commented Nov 24, 2020

Thank you for sharing the PDF @Abdur-rahmaanJ The issue is coming because the PDF has a metadata field by the name Changes which is a list of PDFObjRef objects. I am not sure if that is allowed by the PDF specifications (linking #297 (comment)) but nonetheless, it is something that can be handled in the code. I shall raise a PR for it soon.

@Abdur-rahmaanJ
Copy link
Author

XD since this was the first PDF checked, i assumed pdfplumber was broken!

@samkit-jain samkit-jain self-assigned this Nov 30, 2020
jsvine added a commit that referenced this issue Dec 9, 2020
* Treat invalid/unparseable metadata values as warnings — Certain invalid values if parseable don't throw a warning and only unparseable (always invalid) throw

* Recursively parse metadata values to handle nested `PDFObjRef` objects — Fixes #316

* Resolve lint issues and remove unused imports

* Make metadata parse failure handling behaviour configurable

* Update tests to bump up test coverage

* Update changelog

Co-authored-by: Matt Clark <44023+mclark@users.noreply.github.com>
jsvine added a commit that referenced this issue Dec 9, 2020
Code and commits by @samkit-jain:

* Treat invalid/unparseable metadata values as warnings — Certain invalid values if parseable don't throw a warning and only unparseable (always invalid) throw

* Recursively parse metadata values to handle nested `PDFObjRef` objects — Fixes #316

* Resolve lint issues and remove unused imports

* Make metadata parse failure handling behaviour configurable

* Update tests to bump up test coverage

* Update changelog

Co-authored-by: Samkit Jain <15127115+samkit-jain@users.noreply.github.com>
@jsvine jsvine closed this as completed in 2d9415c Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants