Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide some way of detecting pathological behavior and aborting #5047

Open
jgm opened this issue Nov 5, 2018 · 9 comments
Open

Provide some way of detecting pathological behavior and aborting #5047

jgm opened this issue Nov 5, 2018 · 9 comments

Comments

@jgm
Copy link
Owner

jgm commented Nov 5, 2018

Currently in pathological cases pandoc will just eat up all system memory. It would be nice to provide a way to avoid this.

It is possible to compile the executable with an option that imposes a fixed constraint on heap size, but any choice here seems arbitrary and could limit use of pandoc on really big systems to convert really big files.

Perhaps there's a way to build a parsec combinator that detects excessive backtracking?

Or maybe there's a way to query heap usage in real time and compare to system limits?

@jgm
Copy link
Owner Author

jgm commented Nov 6, 2018

Looks like you can query heap usage
http://hackage.haskell.org/package/base-4.12.0.0/docs/GHC-Stats.html
but this only works if the exuecutable is run with +RTS -T

@jgm
Copy link
Owner Author

jgm commented Nov 6, 2018

For timeout, there's
http://hackage.haskell.org/package/base-4.12.0.0/docs/System-Timeout.html

timeout :: Int -> IO a -> IO (Maybe a)

where the first parameter is microseconds.

One possibility would be to compute a timeout based on input size.

@jgm
Copy link
Owner Author

jgm commented Nov 9, 2018

Making this change to ghc

-  ghc-options:   -rtsopts -with-rtsopts=-K16m -Wall -fno-warn-unused-do-bind -threaded
+  ghc-options:   -rtsopts "-with-rtsopts=-K16m -T" -Wall -fno-warn-unused-do-bind -threaded

makes +RTS -T available by default.

Note that the function getRTSStats in GHC.Stats is only available for base >= 4.10 (ghc 8.2.1). But we could always make the feature conditional using CPP.

One way we could do this would be the following. Before doing the pandoc conversion, fork off an IO process that wakes up every second and checks the allocation stats, and throws a PandocAllocationError if the allocated bytes (or maybe live bytes) exceed a certain limit. After the conversion succeeds, we kill off this thread and exit.

But this is only worth doing if we have some way to query the system memory and set the limit dynamically. Otherwise we might as well just bake in a fixed constraint using the RTS options when compiling. (The problem then is what it should be, given that some systems have a lot of memory...)

@mb21
Copy link
Collaborator

mb21 commented Nov 10, 2018

If we could query the installed system memory (not clear we can with Haskell in a cross-platform way), we could set the limit to system memory - 1GB or 0.9 * system memory, whichever is greater.

@mb21
Copy link
Collaborator

mb21 commented Nov 10, 2018

It appears that on Linux, you could call out to C and use the sysinfo system library. On macOS, only sysctl is available. Then there is Windows...

All things considered, I'm tending more and more towards not solving this issue in pandoc this way. If people have problems, they can just limit the memory of the GHC runtime to whatever they deem sensible on their system. Maybe we should document that somewhere.

@jgm
Copy link
Owner Author

jgm commented Nov 10, 2018

Yes, I think the memory limits approach is probably not going to work.

In principle, a parser combinator library could provide a configurable try that would limit backtracking.

@mb21
Copy link
Collaborator

mb21 commented Nov 11, 2018

I haven't looked too deep into parsing, say, markdown. But I understand that it's not possible to parse commonmark without some amount of backtracking? How do the C/JavaScript implementations solve this?

@jgm
Copy link
Owner Author

jgm commented Nov 11, 2018

The commonmark parsers use various tricks to avoid backtracking at all costs.

@mb21
Copy link
Collaborator

mb21 commented Nov 15, 2018

For reference, to limit the memory of the Haskell runtime, say to 2048 MB (2GB), depending on your system memory:

pandoc.exe +RTS -M2048

see https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#setting-rts-options-on-the-command-line

jgm added a commit that referenced this issue Jan 8, 2019
* These were added by the RST reader and, for literate Haskell,
  by the Markdown and LaTeX readers.  There is no point to
  this class, and it is not applied consistently by all readers.
  See #5047.

* Reverse order of `literate` and `haskell` classes on code blocks
  when parsing literate Haskell. Better if `haskell` comes first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants