-
-
Notifications
You must be signed in to change notification settings - Fork 711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weasyprint 0.42 seems to hang / freeze on processing large tables #691
Comments
I looked into this issue a little more. WeasyPrint is launched to process http://www.dip-badajoz.es/bop/boletin_completo.php?FechaSolicitada=2012-04-09 page in a linux server without any CPU load and with its 8GB RAM near unused. However, in this scenario, the o.s. finished killing WeasyPrint cause system was "out of memory". Then, we upgraded the system to 16GB RAM + 4GB SWAP and tried again with this configuration. In this case, WeasyPrint has eaten all 16GB RAM and it took 2GB from SWAP and just now we are waiting it finish, but it's taking more that 12 minutes, so I'll write here if it finished fine at how much time it took. UPDATE: The process eaten (slowly) even the swap partition so it was killed by kernel :( |
Rendering large tables is known to be slow, and when tables have 23,540 lines (!), it's awfully slow. There are 2 main topics that could help:
I'm currently rendering "2006-08-21", it seems to require less memory than your example does, but it's still really bad. |
I understand the complexity, taking into account that there are even accessibility identification for each cell, but it's a priority. Even "mPDF" hanged proccesing these tables. Despite we are nowadays plitting our tables in most cases, we can't assure we will not need to publish similar tables in the future. On our part, we tried to give all necessary hardware resources to avoid to bother you at this issue :) but the hardware upgrade didn't was enough to solve the problem :( 16GB RAM (+4GB SWAP) it's too much memory to waste I wish you find the way to solve this problem. Regards |
Yes,
You don't bother me at all, it's really useful to get real-life problems.
I've tried many things, and here's what I've discovered so far:
I'll spend some time trying to find why this |
Thank you @liZe . Our CSS files are evolving thru years, making use of new CSS improvements and new features, so I can try to change the " Sincerely, we are not afraid about how much time takes to generate a PDF, while it don't takes more than 1 hour. For us it's more critical memory consumption, despite we can reach up to 18-20GB RAM if neccesary, as long as we have clear what's the maximum RAM necessary for the worst case (basically, the previously included links have the larger tables). |
Any news on this ....??? :( |
Unfortunately no, but it's in the version 43 milestone, so I won't forget. |
Thank you!!! |
Here are the news! The "problem" is in Fortunately, these values have already been calculated and are cached in the table. I'm writing a patch that should avoid re-calculating these values for each page. |
Rendering the whole document now takes less than 13 minutes on my computer, almost the same as without |
Time spent is not a problem in my case, cause it will be processed in a server and I'll bet that the process will not take longer. I try just tomorrow (if I'm able to update Weasyprint .....) |
I upgraded weayprint |
I'm sorry to hear that… I've tried again on my laptop and it worked in 11:34 minutes with 12.7 GB of RAM with Python 3.7 (should be the same with Python 3.6, but may be 30% more with Python 3.5 and older).
|
With the same command, I obtained this:
Process was killed by system, not by user, cause as I commented sometime, it literrally eats all RAM and, slowly, the SWAP. Could it be related with the cairo version? (cairo < 1.15.4). I'll try updating Cairo and I will test again. |
I checked "Cairo" libraries. This is all I have installed about cairo, and they are the latest stable versions ....
Should I force system to install not stable version? |
I really think that the problem is not in external libraries. Your generation is slow because your RAM is full and the swap is slow. It then gets killed because even the swap is full. You can try to generate the document and monitor your server with |
As I wrote in this comment I monitorized previously what's was happening with HTOP. This server have all RAM (16GiB) available for Weasyprint (only 400MiB occupied) , and it's slowly taking RAM. At "Step3 - Applying CSS" weasyprint have near 4GiB and quickly takes whole RAM ( |
@RafaelLinux Which version of Python do you use? |
Python 2.7.13 (default, Sep 26 2018, 18:42:22) |
Hmmm… WeasyPrint 43 (with all the optimizations) only works with Python 3. Are you using an older version of WeasyPrint? |
This is what I get asking Weasyprint:
You can see that always notice me about "Cairo" version ... I don't know if it's the problem. |
No, it's not. You may get rendering problems, but it's not related to memory use.
Then you're using Python 3.5 😄. If you can use 3.6 or 3.7 instead, you'll improve your memory use a lot (because of the new dict implementation).
You can upgrade to the final 43 version with a simple With Python 3.6+ and WeasyPrint 43, there's no reason not to reach my 12 minutes and 12.7 GB RAM on your server. |
I don't understand why if I type "python", a get a console that says |
Yes, it's using a version installed in
I don't know how you installed Python 3.5, so I can't help you about how to upgrade 😞. |
Version 43 ---------- Released on 2018-11-09. Bug fixes: * `#726 <https://github.com/Kozea/WeasyPrint/issues/726>`_: Make empty strings clear previous values of named strings * `#729 <https://github.com/Kozea/WeasyPrint/issues/729>`_: Include tools in packaging This version also includes the changes from unstable rc1 and rc2 versions listed below. Version 43rc2 ------------- Released on 2018-11-02. **This version is experimental, don't use it in production. If you find bugs, please report them!** Bug fixes: * `#706 <https://github.com/Kozea/WeasyPrint/issues/706>`_: Fix text-indent at the beginning of a page * `#687 <https://github.com/Kozea/WeasyPrint/issues/687>`_: Allow query strings in file:// URIs * `#720 <https://github.com/Kozea/WeasyPrint/issues/720>`_: Optimize minimum size calculation of long inline elements * `#717 <https://github.com/Kozea/WeasyPrint/issues/717>`_: Display <details> tags as blocks * `#691 <https://github.com/Kozea/WeasyPrint/issues/691>`_: Don't recalculate max content widths when distributing extra space for tables * `#722 <https://github.com/Kozea/WeasyPrint/issues/722>`_: Fix bookmarks and strings set on images * `#723 <https://github.com/Kozea/WeasyPrint/issues/723>`_: Warn users when string() is not used in page margin Version 43rc1 ------------- Released on 2018-10-15. **This version is experimental, don't use it in production. If you find bugs, please report them!** Dependencies: * Python 3.4+ is now needed, Python 2.x is not supported anymore * Cairo 1.15.4+ is now needed, but 1.10+ should work with missing features (such as links, outlines and metadata) * Pdfrw is not needed anymore New features: * `Beautiful website <https://weasyprint.org>`_ * `#579 <https://github.com/Kozea/WeasyPrint/issues/579>`_: Initial support of flexbox * `#592 <https://github.com/Kozea/WeasyPrint/pull/592>`_: Support @font-face on Windows * `#306 <https://github.com/Kozea/WeasyPrint/issues/306>`_: Add a timeout parameter to the URL fetcher functions * `#594 <https://github.com/Kozea/WeasyPrint/pull/594>`_: Split tests using modern pytest features * `#599 <https://github.com/Kozea/WeasyPrint/pull/599>`_: Make tests pass on Windows * `#604 <https://github.com/Kozea/WeasyPrint/pull/604>`_: Handle target counters and target texts * `#631 <https://github.com/Kozea/WeasyPrint/pull/631>`_: Enable counter-increment and counter-reset in page context * `#622 <https://github.com/Kozea/WeasyPrint/issues/622>`_: Allow pathlib.Path objects for HTML, CSS and Attachment classes * `#674 <https://github.com/Kozea/WeasyPrint/issues/674>`_: Add extensive installation instructions for Windows Bug fixes: * `#558 <https://github.com/Kozea/WeasyPrint/issues/558>`_: Fix attachments * `#565 <https://github.com/Kozea/WeasyPrint/issues/565>`_, `#596 <https://github.com/Kozea/WeasyPrint/issues/596>`_, `#539 <https://github.com/Kozea/WeasyPrint/issues/539>`_: Fix many PDF rendering, printing and compatibility problems * `#614 <https://github.com/Kozea/WeasyPrint/issues/614>`_: Avoid crashes and endless loops caused by a Pango bug * `#662 <https://github.com/Kozea/WeasyPrint/pull/662>`_: Fix warnings and errors when generating documentation * `#666 <https://github.com/Kozea/WeasyPrint/issues/666>`_, `#685 <https://github.com/Kozea/WeasyPrint/issues/685>`_: Fix many table layout rendering problems * `#680 <https://github.com/Kozea/WeasyPrint/pull/680>`_: Don't crash when there's no font available * `#662 <https://github.com/Kozea/WeasyPrint/pull/662>`_: Fix support of some align values in tables
Well, I need some help at this point, cause I have not Python knowledge ... I did again an update of python with
Thank you in advance |
No problem, I'm here to help! What's your Linux distribution and its version? (You can get this information with |
Sorry, I should have wrote that info before.
|
Then your version of Python 3 comes from the official Debian package. There's probably no easy way to get Python 3.6 or 3.7 cleanly installed… According to #70, Python 3.6 helped to save about 25% of memory consumption. Using this approximative value, rendering this page would eat about 17GB with Python 3.5 on your server, and your 16GB RAM + 4GB Swap may not be enough. I have to take the time to profile memory consumption, like it was done on #70. |
Thank you liZe. We just upgraded to Debian 9.6, but still remains Python 3 (2.7 according to So at this thread issue, as we can't try your patch and it's solved, I should not ask more about it till we can give a try to your fix!!!! Thank you again!! |
Trying WeasyPrint with our larger bulletins, we found some (with the largest tables) that WeasyPrint seems unable to process. These are some of them:
http://www.dip-badajoz.es/bop/boletin_completo.php?FechaSolicitada=2006-08-21
http://www.dip-badajoz.es/bop/boletin_completo.php?FechaSolicitada=2012-04-09
http://www.dip-badajoz.es/bop/boletin_completo.php?FechaSolicitada=2014-09-05
Is not usual, but we have some HTML pages up to 9MiB file size. Is there any workaround or parameter to get WeasyPrint to get to process this pages?
Thank you
The text was updated successfully, but these errors were encountered: