Memory leaks occur when saving each page of a PDF as an image #1430

JoeanAmier · 2021-11-30T14:15:43Z

Describe the bug (mandatory)

Memory leaks occur when saving each page of a PDF as an image, I wrote this operation as a function that increases memory each time the loop runs, It doesn't seem to recycle memory

To Reproduce (mandatory)

def pdf():
    doc = fitz.open('xxx.pdf')
    for i in range(doc.page_count):
        img = doc[i].get_pixmap(matrix=fitz.Matrix(0.2, 0.2), alpha=False)
        img.save("%s.png")
    doc.close()

for _ in range(5):
    pdf() # Each execution adds a certain amount of memory

If you comment the code that saves the image, there is no memory problem

def pdf():
    doc = fitz.open('xxx.pdf')
    doc.close()

Your configuration (mandatory)

3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)]
win32

PyMuPDF 1.19.1: Python bindings for the MuPDF 1.19.0 library.
Version date: 2021-10-23 00:00:10.
Built for Python 3.9 on win32 (64-bit).

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2021-11-30T15:31:41Z

I have tested this on Windows and Linux:
Cannot confirm a difference between (a) creating a pixmap alone, and (b) also saving the pixmap.
Dealing with pixmaps as such does create some storage consumption - which is enevitable, because of MuPDF's internal caching of images.

JoeanAmier · 2021-12-01T11:55:56Z

Is there no way to release the cache? Or to achieve the preservation of images of other programs, the current situation in the processing of multiple large PDF documents is easy to take up all the memory

JorjMcKie · 2021-12-01T17:56:55Z

Is there no way to release the cache? Or to achieve the preservation of images of other programs, the current situation in the processing of multiple large PDF documents is easy to take up all the memory

Yes, there is: execute fitz.TOOLS.store_shrink(100). The parameter is the percentage.
When processing the old Adobe manual (1310 pages) five times in a row, the maximum memory usage difference to start of program was 80 MB.
When executing the said instruction after each page, that difference went down to 70 MB.
This benefit was higher with other files.

You can also try the MuPDF CLI tool like this: mutool draw -o p-%d.png -L ... file.pdf.
This has the argument -L to request a low-memory execution.
This does the same said instruction plus it uses a parameter to suppress caching.

Caching suppression is not yet available in PyMuPDF, but I will make sure to include it in the next version.

All this will of course have an adverse effect on performance.

Other considerations to alleviate this problem include using Python multiprocessing as explained in the documentation.

JorjMcKie · 2021-12-01T20:19:35Z

If you run mutool draw -o p-%d.png ... adobe.pdf you will probably observe a similar maximum memory usage of 80 MB. If using the -L option, this goes down to at most 50 MB.
I do expect to achieve this too by suppressing the caching ...

JorjMcKie · 2021-12-02T10:07:02Z

In any case, after being done with a pixmap (i.e. after saving it), set it to None to enforce freeing up its storage.

JorjMcKie · 2021-12-02T19:14:36Z

I found the following logix best to keep intermediate memory under control while also delivering acceptable speed.
Tested this with Adobe manual. The maximum memory usage at any time was below 50 MB:

# process the file in segments / intervals
doc = fitz.open("adobe.pdf")
interval = 50
pc = doc.page_count
pno = 0

while pno < pc:
    limit = min(pc, pno + interval)
    for page in doc.pages(pno, limit, 1):
        pix = page.get_pixmap()
        pix = None  # <== important!

    if limit >= pc:
        break
    pno += interval
    doc.close()  # release file and its resources
    fitz.TOOLS.store_shrink(100)  # empty MuPDF cache
    doc = fitz.open(doc.name)  # recycle document

doc.close()

JoeanAmier · 2021-12-03T14:51:32Z

The solution you described has improved the memory problem, thank you very much.

JoeanAmier added the bug label Nov 30, 2021

JoeanAmier assigned JorjMcKie Nov 30, 2021

JorjMcKie added enhancement and removed bug labels Dec 1, 2021

JorjMcKie closed this as completed Dec 7, 2021

nataliia-obraztsova mentioned this issue Jun 26, 2024

Memory Retention with fitz.page.get_pixmap() #3625

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leaks occur when saving each page of a PDF as an image #1430

Memory leaks occur when saving each page of a PDF as an image #1430

JoeanAmier commented Nov 30, 2021

JorjMcKie commented Nov 30, 2021

JoeanAmier commented Dec 1, 2021

JorjMcKie commented Dec 1, 2021

JorjMcKie commented Dec 1, 2021

JorjMcKie commented Dec 2, 2021

JorjMcKie commented Dec 2, 2021 •

edited

Loading

JoeanAmier commented Dec 3, 2021

Memory leaks occur when saving each page of a PDF as an image #1430

Memory leaks occur when saving each page of a PDF as an image #1430

Comments

JoeanAmier commented Nov 30, 2021

Describe the bug (mandatory)

To Reproduce (mandatory)

Your configuration (mandatory)

JorjMcKie commented Nov 30, 2021

JoeanAmier commented Dec 1, 2021

JorjMcKie commented Dec 1, 2021

JorjMcKie commented Dec 1, 2021

JorjMcKie commented Dec 2, 2021

JorjMcKie commented Dec 2, 2021 • edited Loading

JoeanAmier commented Dec 3, 2021

JorjMcKie commented Dec 2, 2021 •

edited

Loading