Provide some way of detecting pathological behavior and aborting #5047

jgm · 2018-11-05T17:28:51Z

Currently in pathological cases pandoc will just eat up all system memory. It would be nice to provide a way to avoid this.

It is possible to compile the executable with an option that imposes a fixed constraint on heap size, but any choice here seems arbitrary and could limit use of pandoc on really big systems to convert really big files.

Perhaps there's a way to build a parsec combinator that detects excessive backtracking?

Or maybe there's a way to query heap usage in real time and compare to system limits?

jgm · 2018-11-06T03:25:14Z

Looks like you can query heap usage
http://hackage.haskell.org/package/base-4.12.0.0/docs/GHC-Stats.html
but this only works if the exuecutable is run with +RTS -T

jgm · 2018-11-06T03:30:39Z

For timeout, there's
http://hackage.haskell.org/package/base-4.12.0.0/docs/System-Timeout.html

timeout :: Int -> IO a -> IO (Maybe a)

where the first parameter is microseconds.

One possibility would be to compute a timeout based on input size.

jgm · 2018-11-09T19:33:24Z

Making this change to ghc

-  ghc-options:   -rtsopts -with-rtsopts=-K16m -Wall -fno-warn-unused-do-bind -threaded
+  ghc-options:   -rtsopts "-with-rtsopts=-K16m -T" -Wall -fno-warn-unused-do-bind -threaded

makes +RTS -T available by default.

Note that the function getRTSStats in GHC.Stats is only available for base >= 4.10 (ghc 8.2.1). But we could always make the feature conditional using CPP.

One way we could do this would be the following. Before doing the pandoc conversion, fork off an IO process that wakes up every second and checks the allocation stats, and throws a PandocAllocationError if the allocated bytes (or maybe live bytes) exceed a certain limit. After the conversion succeeds, we kill off this thread and exit.

But this is only worth doing if we have some way to query the system memory and set the limit dynamically. Otherwise we might as well just bake in a fixed constraint using the RTS options when compiling. (The problem then is what it should be, given that some systems have a lot of memory...)

mb21 · 2018-11-10T09:48:07Z

If we could query the installed system memory (not clear we can with Haskell in a cross-platform way), we could set the limit to system memory - 1GB or 0.9 * system memory, whichever is greater.

mb21 · 2018-11-10T09:57:04Z

It appears that on Linux, you could call out to C and use the sysinfo system library. On macOS, only sysctl is available. Then there is Windows...

All things considered, I'm tending more and more towards not solving this issue in pandoc this way. If people have problems, they can just limit the memory of the GHC runtime to whatever they deem sensible on their system. Maybe we should document that somewhere.

jgm · 2018-11-10T18:23:28Z

Yes, I think the memory limits approach is probably not going to work.

In principle, a parser combinator library could provide a configurable try that would limit backtracking.

mb21 · 2018-11-11T08:49:43Z

I haven't looked too deep into parsing, say, markdown. But I understand that it's not possible to parse commonmark without some amount of backtracking? How do the C/JavaScript implementations solve this?

jgm · 2018-11-11T18:07:52Z

The commonmark parsers use various tricks to avoid backtracking at all costs.

mb21 · 2018-11-15T08:05:30Z

For reference, to limit the memory of the Haskell runtime, say to 2048 MB (2GB), depending on your system memory:

pandoc.exe +RTS -M2048

see https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#setting-rts-options-on-the-command-line

See #5047.

* These were added by the RST reader and, for literate Haskell, by the Markdown and LaTeX readers. There is no point to this class, and it is not applied consistently by all readers. See #5047. * Reverse order of `literate` and `haskell` classes on code blocks when parsing literate Haskell. Better if `haskell` comes first.

mb21 mentioned this issue Nov 5, 2018

Add a "sandboxed mode" that limits IO #5045

Closed

mb21 added the performance label Nov 14, 2018

jgm added a commit that referenced this issue Nov 15, 2018

MANUAL: Under security, added note about +RTS option to limit heap size.

2347bab

See #5047.

hadley mentioned this issue Sep 18, 2020

allow for bookdown hacking? r-lib/downlit#53

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide some way of detecting pathological behavior and aborting #5047

Provide some way of detecting pathological behavior and aborting #5047

jgm commented Nov 5, 2018

jgm commented Nov 6, 2018

jgm commented Nov 6, 2018

jgm commented Nov 9, 2018

mb21 commented Nov 10, 2018

mb21 commented Nov 10, 2018 •

edited

Loading

jgm commented Nov 10, 2018

mb21 commented Nov 11, 2018

jgm commented Nov 11, 2018

mb21 commented Nov 15, 2018

Provide some way of detecting pathological behavior and aborting #5047

Provide some way of detecting pathological behavior and aborting #5047

Comments

jgm commented Nov 5, 2018

jgm commented Nov 6, 2018

jgm commented Nov 6, 2018

jgm commented Nov 9, 2018

mb21 commented Nov 10, 2018

mb21 commented Nov 10, 2018 • edited Loading

jgm commented Nov 10, 2018

mb21 commented Nov 11, 2018

jgm commented Nov 11, 2018

mb21 commented Nov 15, 2018

mb21 commented Nov 10, 2018 •

edited

Loading