-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide some way of detecting pathological behavior and aborting #5047
Comments
Looks like you can query heap usage |
For timeout, there's
where the first parameter is microseconds. One possibility would be to compute a timeout based on input size. |
Making this change to ghc
makes Note that the function One way we could do this would be the following. Before doing the pandoc conversion, fork off an IO process that wakes up every second and checks the allocation stats, and throws a PandocAllocationError if the allocated bytes (or maybe live bytes) exceed a certain limit. After the conversion succeeds, we kill off this thread and exit. But this is only worth doing if we have some way to query the system memory and set the limit dynamically. Otherwise we might as well just bake in a fixed constraint using the RTS options when compiling. (The problem then is what it should be, given that some systems have a lot of memory...) |
If we could query the installed system memory (not clear we can with Haskell in a cross-platform way), we could set the limit to |
It appears that on Linux, you could call out to C and use the sysinfo system library. On macOS, only All things considered, I'm tending more and more towards not solving this issue in pandoc this way. If people have problems, they can just limit the memory of the GHC runtime to whatever they deem sensible on their system. Maybe we should document that somewhere. |
Yes, I think the memory limits approach is probably not going to work. In principle, a parser combinator library could provide a configurable |
I haven't looked too deep into parsing, say, markdown. But I understand that it's not possible to parse commonmark without some amount of backtracking? How do the C/JavaScript implementations solve this? |
The commonmark parsers use various tricks to avoid backtracking at all costs. |
For reference, to limit the memory of the Haskell runtime, say to 2048 MB (2GB), depending on your system memory:
|
* These were added by the RST reader and, for literate Haskell, by the Markdown and LaTeX readers. There is no point to this class, and it is not applied consistently by all readers. See #5047. * Reverse order of `literate` and `haskell` classes on code blocks when parsing literate Haskell. Better if `haskell` comes first.
Currently in pathological cases pandoc will just eat up all system memory. It would be nice to provide a way to avoid this.
It is possible to compile the executable with an option that imposes a fixed constraint on heap size, but any choice here seems arbitrary and could limit use of pandoc on really big systems to convert really big files.
Perhaps there's a way to build a parsec combinator that detects excessive backtracking?
Or maybe there's a way to query heap usage in real time and compare to system limits?
The text was updated successfully, but these errors were encountered: