Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.to_image() treats a stream as a regular file #948

Closed
Urbener opened this issue Jul 27, 2023 · 3 comments
Closed

.to_image() treats a stream as a regular file #948

Urbener opened this issue Jul 27, 2023 · 3 comments
Labels

Comments

@Urbener
Copy link

Urbener commented Jul 27, 2023

When working with a ZipExtFile, calling page.to_image() ends up throwing a FileNotFoundError, as it's treating the file name inside the zip file as a regular, filesystem-backed file. I think this will apply to other stream types as well, but I haven't been able to test it.

Calling pypdfium2.PdfDocument._process_page() directly works as expected, so I think the problem can be traced here.

However, calling pdfplumber.open(repair=True) does work with some files, and the types received by the get_page_image() change: with repair=True, it gets a _io.BytesIO object, while without it, it gets a ZipExtFile.

# reproducer.py
from zipfile import ZipFile

import pdfplumber

with ZipFile('reproducer.zip') as zip_file:
    with zip_file.open('dummy.pdf') as pdf_file:
        with pdfplumber.open(pdf_file) as pdf:
            page = pdf.pages[0]
            im = page.to_image()

Sample ZIP and PDF file:
reproducer.zip

Environment:

  • pdfplumber version: 0.10.1
  • Python version: 3.10.6
  • OS: Linux (Ubuntu 22.04.2)
@Urbener Urbener added the bug label Jul 27, 2023
@jsvine
Copy link
Owner

jsvine commented Jul 28, 2023

Hi @Urbener, and thank you for flagging. Thanks, too, for the clear description, example file, and code to reproduce. Exactly the kind of issue I like to see!

I agree with your diagnosis of the issue / code to blame. I have some potential solutions in mind, which I'll test. Will keep you updated here.

@jsvine
Copy link
Owner

jsvine commented Jul 29, 2023

Thanks again! Should be fixed in 30a52cb and now available in v0.10.2

@jsvine jsvine closed this as completed Jul 29, 2023
@Urbener
Copy link
Author

Urbener commented Jul 31, 2023

It's my pleasure to help improve such a useful project. It should be me thanking you for creating it and your continued support.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants