Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

page.to_image() causes error "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process " #1072

Closed
domdrag opened this issue Dec 26, 2023 · 5 comments
Labels

Comments

@domdrag
Copy link

domdrag commented Dec 26, 2023

Hi,

amazing work with the library, it has been super useful. Not sure if this is a bug or if I'm doing something wrong.

Describe the bug

Page.to_image() method causes the PDF file stuck in another process. Trying to e.g. os.remove the PDF file afterward causes
the PermissionError.

Have you tried repairing the PDF?

I've tried this and stumbled upon the same issue mentioned here.
However, this issue shouldn't be due to the PDF file itself.

Code to reproduce the problem

import pdfplumber

with pdfplumber.open('sample.pdf') as PDF:
    page = PDF.pages[0]
    image = page.to_image()
    
import os
os.remove('sample.pdf')

PDF file

sample.pdf

Expected behavior

Expected behavior was to successfully remove the PDF file.
The code works if we comment the image = page.to_image() line.

Actual behavior

PermissionError exception is raised.

Screenshots

Screenshot_8

Environment

  • pdfplumber version: 0.10.3
  • Python version: 3.9.2 (also getting the same issue with the lastest 3.12.1)
  • OS: Windows 10 Education

Additional context

NA

@domdrag domdrag added the bug label Dec 26, 2023
@jsvine
Copy link
Owner

jsvine commented Jan 7, 2024

Hi @domdrag, and glad to hear that the library has been useful. I've tried running your code on my computer, but cannot replicate the error. (Everything runs as expected.) I'm using a Mac, though.

To the Windows users in the pdfplumber community: Any volunteers to test this out and report back whether you encounter an error?

@dhdaines
Copy link
Contributor

dhdaines commented Feb 5, 2024

Confirmed this problem on Windows (Python 3.12 on Windows 10). This is probably fixed by #1090 since the problem there is precisely that we aren't explicitly closing the pypdfium2 document, and pypdfium2 won't close the file otherwise, even when the object goes out of scope. This is a very questionable API decision or even a bug in pypdfium2 but we can work around it.

@dhdaines
Copy link
Contributor

dhdaines commented Feb 5, 2024

Confirmed fixed by #1090

@jsvine
Copy link
Owner

jsvine commented Feb 10, 2024

Thank you for solving this mystery, @dhdaines. (And thanks for opening this issue, @domdrag.)

@jsvine jsvine closed this as completed Feb 10, 2024
@mara004
Copy link

mara004 commented Mar 19, 2024

and pypdfium2 won't close the file otherwise, even when the object goes out of scope. This is a very questionable API decision or even a bug in pypdfium2

See #1089 (comment) for an explanation.
As I said, managing a file handle is in the responsibility of the caller that opened it, unless explicitly delegated.

If the file handle was opened within pypdfium2/pdfium, it will also close it, but then it still falls to the caller to close the pypdfium2 object that holds the file handle, because otherwise we'll have to wait for the garbage collector, which is unpredictable.

This is neither pypdfium2 decision nor bug, but rather file handle and object hierarchy logic.

pypdfium2 won't close the file otherwise, even when the object goes out of scope.

pypdfium2 does install a finalizer that will close the FD, if the input is a file path, or a file handle and autoclose=True. The problem is just the garbage collector delay for the finalizer to be actually called, but that is out of our hands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants