Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text.Pandoc.Pretty: excessive CPU and memory usage #1785

Closed
t-8ch opened this issue Dec 5, 2014 · 7 comments
Closed

Text.Pandoc.Pretty: excessive CPU and memory usage #1785

t-8ch opened this issue Dec 5, 2014 · 7 comments
Labels
Milestone

Comments

@t-8ch
Copy link
Contributor

t-8ch commented Dec 5, 2014

The file found under https://gist.github.com/t-8ch/76d21fc1321880f48eb6 triggers excessive resource consumption when converting from html to markdown/plain.
Converting to native format is reasonably fast (0.5 seconds), but plain takes over 5 gigs of RAM and several minutes to convert.

Happens with 704cfc1 (latest master as of now)

Command to reproduce $ pandoc -f html -t plain bad.html

@mpickering mpickering added the bug label Dec 6, 2014
@mpickering
Copy link
Collaborator

Here are the results of profiling.

--- Updates

Fusing the definition of realLength helps significantly.

That being said, I'm not getting as bad performance as you are encountering. Are you compiling with -O2?

@t-8ch
Copy link
Contributor Author

t-8ch commented Dec 6, 2014

Thanks for looking into this!

I just did cabal configure && cabal build

@mpickering
Copy link
Collaborator

The output file is over 900 000 lines long, surely this is indicative of another problem?

I think the problem is that the source file uses lots of nested tables to control the layout. Pandoc doesn't handle this sort of input very well so I suspect that this is the problem.

@t-8ch
Copy link
Contributor Author

t-8ch commented Dec 7, 2014

Yes, this input is definitively totally broken. (HTML emails, yeah).
On my system even with your optimization the OOM killer kicks in.
Now I simply limit the memory size of the haskell RTS and open those mails in firefox.

Thanks for looking into this @mpickering!

@jgm
Copy link
Owner

jgm commented Dec 9, 2014

@mpickering, should this really be closed? It doesn't sound like the problem is really resolved.

@mpickering mpickering reopened this Dec 9, 2014
@mpickering
Copy link
Collaborator

I think the input is pathological but there might be something better we can do when there are lots of nested tables.

@jgm
Copy link
Owner

jgm commented Feb 22, 2017

The profiling report suggests that the problem is entirely in Text.Pandoc.Pretty.

@jgm jgm changed the title Markdown writer: excessive CPU and memory usage Text.Pandoc.Pretty: excessive CPU and memory usage Feb 22, 2017
@jgm jgm added this to the pandoc 2.0 milestone Feb 22, 2017
@jgm jgm closed this as completed in dc9788b Mar 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants