-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in page.getPixmap() #130
Comments
I am on a journey abroad, so I am limited to dig into this very much. I'll be back Thirsday next week. Anyway, this sounds like a problem that I thought was fixed. Please provide me with your system parameters: OS, bitness, Python version and PyMuPDF version. Are you sure that the memory accumulation is caused by pixmap creation - and not by the getPNGdata method? |
Hi, OS is Win 10, 64bit, I see the same memory behavior when running the demo pdf viewer (PDFdisplay.py). |
ok, thanks for the info. That's pixmap creation then. |
I have found the cause for the issue, and I am testing the fix. |
Thanks a lot |
Just uploaded a fix into branch 1.12.2. |
Memory consumption stays now below 400MB and does not gradually increase. I guess still some stuff stays in memory as when the process finishes still 312MB are used by process. |
Maybe I should be using an example with larger pages than this one? Let me try more documents ... |
With larger page sizes (100 A4 pages, complex graphics) I am also staying around 100 MB. |
I did some MuPDF store memory analysis: Obviously, their strategy is to maintain stuff in memory as much as possible for performance reasons. There is an upper limit for the so-called global context (256 MB), which is specified in There do exist low-level functions to exert some control, too. Like deleting some storable memory by percentage value and so on. Todate not implemented in PyMuPDF. |
To complete my investigation, I rendered the file mentioned above (100 A4 pages, complex graphics ...) with MuPDF's The mentioned utility has an additional feature: one can empty the storable memory cache after each page has been processed. This keeps memory usage low (below 20 MB in this case) - of course at the cost of processing speed (first measurements point to 60% longer runtime). |
Since yesterday I've been trying a few things to make sure that we no longer have a memory leak in PyMuPDF.
The second script empties the store before each new pixmap. This keeps memory usage down to the minimum required by the opened document itself, let's say in the ballpark of 10 MB. After closing / deleting the document, memory usage dropped again close to the start value. The second script needed a considerably increases runtime: my science magazine (the 100 pager) needed 20% longer, and the Adobe manual more than 100%. I would however argue, that there already exists such a threshold: the aforementioned 256 MB built into PyMuPDF. If you wanted, you could change that value in file Please let me know your reaction. |
Hi, I'm getting the same warning and error for a 50 page pdf when I run page.getText() on v1.13.20... |
Hi, |
if you mean the icc related messages, you can just ignore them. If you however are getting memory leaks (the actual topic of this issue!), then I am indeed alarmed. |
I've tested the script and i'm getting the ICC warning for each page that getText() is called on. I annotate each page subsequently and that doesn't give any warnings. The warnings don't appear for a 2 page document though. Sending data is no possible unfortunately. Since there is no reduction in performance or function, I suspect it may be just the warnings. |
Yes, these are warnings. I was a bit frightened at first, but I now understand you picked the wrong issue - your observation has nothing to do with memory leaks obviously ... |
fair enough, it was the first and only search result for the warning text and pymupdf :) I'll upgrade and see if the issue is solved, thx! |
confirmed, upgrade to 1.14.x solved the warnings, thx! |
I'm using pymupdf to loop though all pages in a PDF and extracting pages from it:
Using memory profiler I see that every loop getPixmap allocates 20MB of data which are not freed. I tried different things like
del pix
orgc.collect
but the memory usage increases every run. As I'm handling quiet some pages, my scripts runs out of memory...getPixmap also shops the following warning/error:
The text was updated successfully, but these errors were encountered: