-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text.Pandoc.Pretty: excessive CPU and memory usage #1785
Comments
Here are the results of profiling. --- Updates Fusing the definition of That being said, I'm not getting as bad performance as you are encountering. Are you compiling with |
Thanks for looking into this! I just did |
The output file is over 900 000 lines long, surely this is indicative of another problem? I think the problem is that the source file uses lots of nested tables to control the layout. Pandoc doesn't handle this sort of input very well so I suspect that this is the problem. |
Yes, this input is definitively totally broken. (HTML emails, yeah). Thanks for looking into this @mpickering! |
@mpickering, should this really be closed? It doesn't sound like the problem is really resolved. |
I think the input is pathological but there might be something better we can do when there are lots of nested tables. |
The profiling report suggests that the problem is entirely in Text.Pandoc.Pretty. |
The file found under https://gist.github.com/t-8ch/76d21fc1321880f48eb6 triggers excessive resource consumption when converting from html to markdown/plain.
Converting to native format is reasonably fast (0.5 seconds), but plain takes over 5 gigs of RAM and several minutes to convert.
Happens with 704cfc1 (latest master as of now)
Command to reproduce
$ pandoc -f html -t plain bad.html
The text was updated successfully, but these errors were encountered: