Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restrict operations according to shell-escape, openin_any, openout_any #2218

Open
xworld21 opened this issue Sep 23, 2023 · 2 comments
Open

Comments

@xworld21
Copy link
Contributor

Running LaTeXML on untrusted inputs is dangerous in so far as it will run arbitrary perl code (by loading .ltxml files) and read or write to arbitrary locations (in different phases: \input in TeX, document() in XSLT, etc).

TeX has a simple security model: -shell-escape (and environment variable shell_escape) controls arbitrary code execution; openin_any, openout_any1 control whether access is restricted to the current/output directories or all the filesystem.

Maybe LaTeXML could follow the same model?

Footnotes

  1. Documentation at https://tug.org/texinfohtml/kpathsea.html#Calling-sequence. Values a (all), p (paranoid), r (restricted), plus some backward compatibility aliases.

@dginev
Copy link
Collaborator

dginev commented Sep 25, 2023

Some relevant prior discussion is in #606 which lead to the secureio plugin.

Generally it's quite hard to improve the safety profile of latexml with claims about it being "complete", especially in the command-line use cases.

It is a little more manageable to containerize the conversion in e.g. a Docker image (related #1178) and pose restrictions on the source contents being passed in. Though they are not mutually exclusive.

@xworld21
Copy link
Contributor Author

I think this kind of change is more feasible in light of #2185 - I like to think I have found all the I/O happening in latexml, and adding some hooks to stop with errors when reading or writing outside the boundary set by open(in|out)_any is doable, in principle. But this could just be an intermediate step: first an implementation of -recorder, and once it seems complete, you can bolt on I/O filtering. (Full filtering requires #2053 to also catch I/O from LibXSLT.)

For this to make any sense, one also need some form of -shell-escape to forbid custom .ltxml bindings, i.e. bindings should be loaded from the default locations, but not from . unless specifically requested with --path=. or --shell-escape.

If the above is workable, latexml could reach the same safety profile of a normal LaTeX run, which is a familiar thing.

Of course I am making big assumptions about the other tools (dvipng, dvisvgm, Ghostscript, and [shudders] ImageMagick) having a similar safety approach, i.e. not reading from/writing to arbitrary locations when fed dodgy inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants