-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The binary format doesn't have to directly mirror normal JS. #37
Comments
Thanks for the proposals! Let me try and sort it a bit. Kinda language extensions
While each of these proposals makes sense individually, I really feel that they are mostly orthogonal to Binary AST. Recall that each addition to Binary AST will make its adoption more complicated by browsers and VMs. For this reason, we wish to keep it as minimal as possible, which means sticking exactly to the JS language. Once Binary AST is a standard, we'll be happy to rediscuss each of these proposals, though :) Kinda optimizations
That definitely makes sense.
Already the case :)
Encoders are, of course, free to make such transformations, especially if they are optimizations. I don't think we want to hardcode them into Binary AST, though.
I don't understand that one, could you elaborate?
Already the case :)
I like the idea. We'll need to think more about it. Kinda metadata
Definitely an objective. Not on the top of our list yet, but we're working on it.
Indeed, we're thinking of a built-in metadata field. It shouldn't be hard to add, as long as it doesn't have semantics. I suspect that most toolchains will strip it, though. |
For switches, how about having a table at the beginning that has Exampleswitch (expr) {
case 'Oranges':
console.log('Oranges are $0.59 a pound.');
break;
case 'Mangoes':
case 'Papayas':
console.log('Mangoes and papayas are $2.79 a pound.');
// expected output: "Mangoes and papayas are $2.79 a pound."
break;
default:
console.log('Sorry, we are out of ' + expr + '.');
} (source) Binary AST: Header:
Body: console.log('Oranges are $0.59 a pound.');
break;
console.log('Mangoes and papayas are $2.79 a pound.');
break;
console.log('Sorry, we are out of ' + expr + '.'); |
@j-f1 Why not. What's the expected benefit? |
It would allow the engine to skip over unused code and only parse the statements used in the current condition. Additionally, the engine would be able to determine if the |
@Yoric Of course, some of those points and thoughts were more or less very forward-looking, and not exactly what I'd expect would have a chance in making it into an MVP. (Like, for example, the synthetic anonymous locals.)
Think similar to WebAssembly and how its blocks work, just optionally a little heavier (if you need to introduce locals in them) and with unconditionally only one return value. That's pretty much what I had in mind. WebAssembly itself is expression-based, it uses blocks with To expand on my concrete example: An engine wouldn't need to do much work to infer that the result of a particular * In reality, an expression-based syntax means we wouldn't need to support |
We're thinking about something kind-of like this, but if we do it, this will complicate the specifications of |
@Yoric did a great job separating out the @isiahmeadows' language feature ideas from other suggestions. One of the highest, if not the highest constraints in this design space is to not bifurcate the language. To that end, I take the existence of an "obvious" bijection (modulo whitespace and whatnot) between source JS and binary AST to be a hard constraint. That is, binary AST will not add any language features. What's expressible in source JS must be expressible in binary AST. Like @Yoric said, the biggest reason is adoption, both for developers and for web browsers. Please do not take binary AST to be our trying to design a better JS. It is very much just the JS language with a more performant transport format. I also saw mentions to bytecode. There will never be any sort of bytecode involved with binary AST. |
@syg I'm in high support of keeping it bijective. My hard constraint with all the "language feature" ideas I proposed were that every single one of them had to be trivially desugared. The only exception being synthetic locals used in a scope accessible to a direct
I did sometimes conflate the binary AST representation with the concept of "bytecode", but I know it's technically incorrect - bytecode is a sequence of instructions, not a graph of nodes. I did make one suggestion here that would've actually made it bytecode in the technical sense, but it was kind of a thing thrown at the wall:
I was already neutral on it to start, just throwing it out there for the sake of being thrown out there. I'm also not a big fan of the idea, since bytecode doesn't always end up smaller than a basic AST, and it's not always the most intuitive. And the more I think about this, the more I'm leaning against it. With this, all of a sudden you need a decompiler to debug the code, and we shouldn't be making browsers ship a glorified decompiler. This isn't WebAssembly, and even that one has chosen to make their bytecode a borderline AST format, just one that's mostly in reverse Polish notation for the node types. (Their text format lays this pretty bare.) |
@isiahmeadows:
As you correctly point out, engines already do these sorts of simple translations, often at parse time. What's the proposed benefit to having the compiler do them? Keeping in mind that there's a real cost, which is adding new node types and defining and implementing their semantics. |
@Yoric I forgot to clarify that the proposed global "metadata" field was supposed to not have any real semantics attached. It's for tools, not engines. |
Slightly less data over the wire, and engines don't need to enlist their heuristics to figure them out. Not all of them are especially justifiable, but some of them could be.
|
@isiahmeadows
In what way? Just tested: (function(){'use strict'; const undefined = 3; return undefined; })(); returns 3. |
|
@ljharb @erights I meant it's a pseudo-literal in that it has those special semantics - it's not a simple identifier in strict mode. It evaluates to:
In sloppy mode, it's just a simple global binding lookup, so it's of course not a special syntactic construct. |
Where is this special case in the spec? Is this special case observable? |
I edited it because I thought I knew how it worked, but apparently I was mistaken on particulars. 🙂 I thought it was a special case, but it's just a combination of descriptors + etc. For what it's worth, UglifyJS only reduces |
I'm not convinced it's required to marry the binary format too tightly to the JS source, and there are things you could do to make it simpler. It's not like the binary format is meant to be human-readable, just machine-readable. There are also idioms in compiled-to-JS code (e.g. from Elm, CoffeeScript, and Babel) that could also stand to benefit from a binary format that that's at least aware of some of their needs.
Here's a few of the ideas I have to throw at the wall. Apologies if this comes across as a bit rambly.
Adding a constant for
undefined
void 0
(what UglifyJS replaces strict-modeundefined
with) and similar.undefined
variable overvoid 0
for code size reasons.Allowing synthetic, anonymous locals that aren't viewable by direct
eval
orwith
Adding built-in source map support.
Adding a built-in description/metadata field.
Adding combined type-checking operators for
typeof x === "number"
, etc.typeof
use cases, and engines already desugar these 100% of the time. (It's an obvious optimization.)Adding combined coersion operators for
x | 0
/~~x
,!!x
,x + ""
, etc.x | 0
or at least act as a synonym.Separating the pragmas from the source/function body.
Separating strings from the code, instead storing debug string/name references as
{int offset; int length}
pairs and putting their values consecutively in a string table before any real code references.Reducing logical "and"/"or" to corresponding if/else variants:
test && other
↔(tmp = test) ? tmp : other
,test || other
↔(tmp = test) ? other : test
.Using breakable blocks with an expression-based AST instead of a statement-based one.
let x = foo(); try { x = foo(); } catch (e) { if (!(e instanceof SafeError)) throw e }
, where it could be simplified to something roughly likelet x = try { foo() } catch (e) { e instanceof SafeError ? undefined : throw e }
. Edit: Missed a backtick 😄Requiring fallthrough to be explicit in
switch
statements.switch
would become much smaller when encoded.Declaring locals before script/module/function body.
Declaring imports/exports before module data table.
Encoding the bytecode as a hybrid register/stack machine.
The text was updated successfully, but these errors were encountered: