restrict operations according to shell-escape, openin_any, openout_any #2218

xworld21 · 2023-09-23T09:42:40Z

Running LaTeXML on untrusted inputs is dangerous in so far as it will run arbitrary perl code (by loading .ltxml files) and read or write to arbitrary locations (in different phases: \input in TeX, document() in XSLT, etc).

TeX has a simple security model: -shell-escape (and environment variable shell_escape) controls arbitrary code execution; openin_any, openout_any¹ control whether access is restricted to the current/output directories or all the filesystem.

Maybe LaTeXML could follow the same model?

Documentation at https://tug.org/texinfohtml/kpathsea.html#Calling-sequence. Values a (all), p (paranoid), r (restricted), plus some backward compatibility aliases. ↩

The text was updated successfully, but these errors were encountered:

dginev · 2023-09-25T18:06:25Z

Some relevant prior discussion is in #606 which lead to the secureio plugin.

Generally it's quite hard to improve the safety profile of latexml with claims about it being "complete", especially in the command-line use cases.

It is a little more manageable to containerize the conversion in e.g. a Docker image (related #1178) and pose restrictions on the source contents being passed in. Though they are not mutually exclusive.

xworld21 · 2023-09-26T18:27:14Z

I think this kind of change is more feasible in light of #2185 - I like to think I have found all the I/O happening in latexml, and adding some hooks to stop with errors when reading or writing outside the boundary set by open(in|out)_any is doable, in principle. But this could just be an intermediate step: first an implementation of -recorder, and once it seems complete, you can bolt on I/O filtering. (Full filtering requires #2053 to also catch I/O from LibXSLT.)

For this to make any sense, one also need some form of -shell-escape to forbid custom .ltxml bindings, i.e. bindings should be loaded from the default locations, but not from . unless specifically requested with --path=. or --shell-escape.

If the above is workable, latexml could reach the same safety profile of a normal LaTeX run, which is a familiar thing.

Of course I am making big assumptions about the other tools (dvipng, dvisvgm, Ghostscript, and [shudders] ImageMagick) having a similar safety approach, i.e. not reading from/writing to arbitrary locations when fed dodgy inputs.

dginev added enhancement latexml labels Sep 25, 2023

dginev added this to the Future (if) milestone Sep 25, 2023

xworld21 mentioned this issue Jan 7, 2024

allow stylesheets to read and write files in the site directory #1951

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restrict operations according to shell-escape, openin_any, openout_any #2218

restrict operations according to shell-escape, openin_any, openout_any #2218

xworld21 commented Sep 23, 2023

dginev commented Sep 25, 2023

xworld21 commented Sep 26, 2023

restrict operations according to shell-escape, openin_any, openout_any #2218

restrict operations according to shell-escape, openin_any, openout_any #2218

Comments

xworld21 commented Sep 23, 2023

Footnotes

dginev commented Sep 25, 2023

xworld21 commented Sep 26, 2023