You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When working with a ZipExtFile, calling page.to_image() ends up throwing a FileNotFoundError, as it's treating the file name inside the zip file as a regular, filesystem-backed file. I think this will apply to other stream types as well, but I haven't been able to test it.
Calling pypdfium2.PdfDocument._process_page() directly works as expected, so I think the problem can be traced here.
However, calling pdfplumber.open(repair=True) does work with some files, and the types received by the get_page_image() change: with repair=True, it gets a _io.BytesIO object, while without it, it gets a ZipExtFile.
# reproducer.py
from zipfile import ZipFile
import pdfplumber
with ZipFile('reproducer.zip') as zip_file:
with zip_file.open('dummy.pdf') as pdf_file:
with pdfplumber.open(pdf_file) as pdf:
page = pdf.pages[0]
im = page.to_image()
Hi @Urbener, and thank you for flagging. Thanks, too, for the clear description, example file, and code to reproduce. Exactly the kind of issue I like to see!
I agree with your diagnosis of the issue / code to blame. I have some potential solutions in mind, which I'll test. Will keep you updated here.
When working with a
ZipExtFile
, callingpage.to_image()
ends up throwing aFileNotFoundError
, as it's treating the file name inside the zip file as a regular, filesystem-backed file. I think this will apply to other stream types as well, but I haven't been able to test it.Calling
pypdfium2.PdfDocument._process_page()
directly works as expected, so I think the problem can be traced here.However, calling
pdfplumber.open(repair=True)
does work with some files, and the types received by theget_page_image()
change: withrepair=True
, it gets a_io.BytesIO
object, while without it, it gets aZipExtFile
.Sample ZIP and PDF file:
reproducer.zip
Environment:
The text was updated successfully, but these errors were encountered: