-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate options to make internal usage of eval redundant or optional #1019
Comments
Typo in
? |
Generation of WebAssembly code would be pretty neat and presuably result in very fast evaluation. I think it would also guarantee that mathjs does not allow injection of malicious code when parsing expressions. |
@harrysarson, Good point. wasm is designed to allow proper sandboxing via It would only work on Node >= 7 and probably not in browser for a while. It might also allow access to vector instructions via wasm loops. |
oops, fixed that.
@harrysarson I'm not sure, does it protect against running arbitrary code when you've found access to |
Wouldn't WebAssembly require writing native code (like C, C++ or Rust)? I don't think WebAssembly is a viable option for now because of limited compatibility, but if emscripten is used to create WebAssembly from native code, the same code can be used to create asm.js for the time being, so this might be a possible direction to go. |
@josdejong It seems like you are using WebAssembly and asm.js interchangeably. But they are something quite different. asm.js is (from my understanding) just a subset of regular JavaScript with some type annotation so that Browser that support it can convert it into efficient machine code. WebAssembly is (again from my understanding) an API and Instruction set that maps very closely to actual processor instruction sets, so they can be compiled to native instructions before the code is run. |
@FSMaxB, WebAssembly would require compiling an AST to a stack-based binary instruction set. My understanding is that asm.js was a proposal that Mozilla came put forward as what they would like instead of NaCl. That proposal showed that there was broad support for something like asm.js but without the JavaScript parser overhead. Most of the people who did spec writing for either NaCl or asm.js are now behind the WebAssembly spec and this is the first such proposal to get support from all major browser vendors. |
🤣 ... well... that's illustrative for how much I know about it |
My understanding is that the validator disallows any call that doesn't provably produce a float and the return type of the |
I just did a quick and dirty experiment removing compiling of an expression via:
to basically:
First results look awesome:
The compiler without |
Wow! Is |
It's basically something like this: the expression parser parses input into an expression tree or AST, which contains nodes like OperatorNode, FunctionNode, ConstantNode, SymbolNode, etc. For example a FunctionNode contains properties like Without compilation, you could evaluate such a node like this (simplified): FunctionNode.prototype.eval = function (scope) {
if (this.fn in scope) {
return scope[this.fn].apply(null, this.args.map((arg) => arg.eval(scope));
}
else {
return math[this.fn].apply(null, this.args.map((arg) => arg.eval(scope));
}
} This works, but for every evaluation a lot of work has to be done (if statements, iterating over arguments, calling compile on child nodes everywhere, ...) To make this faster, mathjs first compiles into an optimized JavaScript function, something like this, FunctionNode.prototype.compile = function (math) {
if (this.fn in scope) {
return (this.fn) + '(' + this.args.map(arg => arg.compile(math)).join(', ') + ')';
}
else {
return math[this.fn].apply(null, this.args.map(function (arg) {
return arg.eval(scope);
});
}
}
// and which generates code which (simplified) looks like say:
// '("sum" in scope ? scope["sum"] : math["sum"])(2, 3, 4)'
// which output is then wrapped and compiled into JS like:
{
expr: new Function('scope', 'return ' + tree.compile(math))
} When I wrote this compiling step, it resulted in performance improvements in the order of a factor 10. Now I'm experimenting with returning a function instead of returning a string which will be evaluated as new JavaScript: FunctionNode.prototype.compile = function (math) {
var fn = this.fn // for example 'sum'
var args = this.args.map(arg => arg.compile(math))
return function (scope) {
return (fn in scope ? scope[fn] : math[fn]).apply(null, args.map(arg => arg(scope)));
}
}
// note that we can easily write an optimized version for functions
// with 1 or 2 arguments for example. Apparently this new solution has a similar performance (or even slightly better) than the eval version. That looks counter intuitive, because an evaluation involves again more function calls and looping over arrays, which are relatively expensive operations. So... it looks like today's JavaScript engines are just extremely good in optimizing a cascading chain of these little nested functions. |
Why not go back to the original approach with |
|
The example I posted is quite simplified, some nodes have a lot of logic which you can execute once in a compile step, as it doesn't yield changes when executing with a different scope. See for example AssignmentNode which has quite some logic. If you do as much work as beforehand in a compile step, and only keep the stuff that depends on the values in the scope dynamic. That make evaluation much faster. |
I've completely rewritten the This is a breaking change, since the Branch: https://github.com/josdejong/mathjs/tree/compile-without-eval Note that this doesn't make security a non-issue for the parser at all, but it makes some categories of exploits impossible. It also makes the code easier to read/understand, makes debugging easier (good stacktrackes). |
Is the code below where identifiers in untrusted inputs get resolved to lvalues and rvalues? mathjs/lib/expression/node/SymbolNode.js Lines 130 to 135 in 1192bb6
If so, there might be a few tweaks to SymbolNode that might prevent custom node implementations from naively trusting the name. |
Yes, that's one of them. Basically, many exploits boil down to getting access to
What tweaks do you mean exactly? |
This sounds like RCE: // Let x be any value not in
// (null, undefined, Object.create(null)).
var x = {},
// If the attacker can control three strings
a = 'constructor',
b = 'constructor',
s = 'console.log(s)';
// and trick code into doing two property lookups
// they control, a call with a string they control,
// and one more call with any argument
x[a][b](s)();
// then they can cause any side-effect achievable
// solely via objects reachable from the global scope.
// This includes full access to any exported module APIs,
// all declarations in the current module, and access
// to builtin modules like child_process, fs, and net.
Well, if the problem is untrusted identifiers reaching the square bracket operator, then class UntrustedIdentifier {
constructor (identifier) {
Object.defineProperty(
this, unsafeId,
{ configurable: false, writable: false, value: String(identifier) })
}
toString () {
// Coercion to string returns something that aids debugging.
// Searches for UnsafeIndentifier should direct devs to this file.
return '(UnsafeIdentifier ' + JSON.stringify(this.id) + ')'
}
/** Convenience to read an own property but not anything from a prototype. */
readFrom (obj) {
return Object.hasOwnProperty.call(obj, this.id) ? obj[this.id] : undefined
}
} |
Thanks Mike, I like that idea! That will prevent accidentally using an unsafe property or method. |
@ThomasBrierley want do you think of Mike's idea? |
I think I might be showing my ignorance of mathjs here so keep that in mind for my bellow comments. My understanding of the methods exported by
Would every operator implementer have to make this consideration? or would it be applied in one place? sorry maybe you can enlighten me here... basically my thoughts are if this requires multiple implementors to be thoughtful of security in multiple places it potentially opens up lots of holes from the inevitable little mistakes. On the other hand if it's only applied in one place then extra care only has to be taken in one place. [EDIT] Reading backwards up this thread :P so currently the property guards are called via various nodes classes and doesn't have to be considered by implementer of each operator? (Which I think is a good thing from a security point of view at least). Would it be possible to use that identifier wrapper in a centralised manner also? I think there might be some differences here though i'm not very familiar with the compiler: currently we only check for properties of objects, but an identifier could just be some variable, unless they are just checked as a property of scope? Sorry lots of questions, i'm trying to catch up with this thread. |
@ThomasBrierley currently these security functions like |
Ahh, well that certainly does sound better. The natural interface for my new implementation is Also "obj" is currently mandatory... would that pose a problem for some identifiers when referencing a global variable of some kind? |
Sounds good. Maybe a single method with three values for
I don't expect so: "global" variables are defined in the |
That sounds fine. I can see that from the perspective of actually getting, setting and calling they do have quite different signatures... However from the perspective of purely checking if it's safe it is useful to have a single function with a mode for the sake of keeping the rules organised. I'll carry on with the refactor as planned but expose a new |
Cool, I'm curious how this approach will work out! |
I will close this issue now: mathjs v4 is completely Until a month ago I hadn't dare to dream about getting this far. @mikesamuel thanks for triggering this whole discussion. I'm afraid you will have to rewrite the comments on the usage of eval in mathjs in your Security Roadmap to past perfect 😬 See #682 (comment) and the HISTORY.md file of v4 for details. |
@josdejong Awesome. Updated the roadmap. |
The expression parser of mathjs has it's own expression parser, which first parses and expression into an AST, and then compiles it into high performing JavaScript code using eval. The usage of eval is great for performance, but it's also a security risk. In some restricted environments usage of eval is blocked for security reasons, which means that you can't use math.js everywhere.
I would love to do some experimenting in different directions (though I don't expect to have time for that anytime soon myself):
eval
anymore. For example, small functions will get inlined by the JS engine so we may not need to do that ourselves.eval
remains critical in order to achieve good performance, it would be awesome if we could somehow make this optional: have two packages of mathjs, a slow (safer) one without usage of eval, and a fast one using eval. Not sure how we could do that without having to duplicate the logic in the compiler though.vm.runInContext
turned out to be insecure, there are other initiatives which could be interesting to look into: frozen realms, asm.js, ADSafe.There is a small benchmark file here, which can be used to see whether changes increase or decrease the performance.
Just to give you a little feeling what sort of code the expression generates right now:
I should note that there is another place where eval is used internally: in
typed-function
, but let's keep that for another discussion.The text was updated successfully, but these errors were encountered: