-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple fixes #10
base: main
Are you sure you want to change the base?
Multiple fixes #10
Conversation
Fixed keyed property usage
Fixed Construct with r0-r0 range usage.
String encoding fixer.
Fixed strings encoding output.
Replaced type checking for a reg value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for all these fixes!
I left a few review comments, would be nice if we can figure out how to solve them
Simplify/fix_strings_encoding.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also tried something similar but I think this needs a v8 patch
https://github.com/v8/v8/blob/0e75d85d8e3467a536bca01d89d8a180a8bcffca/src/objects/string.cc#L725-L726
The issue is that v8 passes an utf-16 number to \\x%02x
, so the output can be \x00
to \xffff
But when parsing we can't tell where the code point ends and the next one starts (e.g. \x7f34
="缒"
vs \x7f34
="\x7f"+"34"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to JS syntax, there is only one interpretation of \xAB
, and only one interpretation of \u{AABB}
- so there are no \xAABB
and no \u{AB}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the disassembler output doesn't adhere to js syntax and won't contain \u{AABB}
you can try compiling/decompiling the following script:
x = "\u{345}6"; // length 2
y = "\u{3456}"; // length 1
62 S> 0x1a1c74e41246 @ 0 : 13 00 LdaConstant [0]
64 E> 0x1a1c74e41248 @ 2 : 23 01 00 StaGlobal [1], [0]
78 S> 0x1a1c74e4124b @ 5 : 13 02 LdaConstant [2]
80 E> 0x1a1c74e4124d @ 7 : 23 03 02 StaGlobal [3], [2]
0x1a1c74e41250 @ 10 : 0e LdaUndefined
95 S> 0x1a1c74e41251 @ 11 : a9 Return
Constant pool (size = 4)
0x1a1c74e41259: [FixedArray] in OldSpace
- map: 0x06c319d80211 <Map(FIXED_ARRAY_TYPE)>
- length: 4
0: 0x1a1c74e41289 <String[2]: u#\x3456>
1: 0x06c319d838c1 <String[1]: #x>
2: 0x1a1c74e412a1 <String[1]: u#\x3456>
3: 0x06c319d838d9 <String[1]: #y>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the disasm. Try to compile an umlaut and any unicode symbol, not its hex/u encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still getting a similar result.. Could you please send an example how it looks for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so what do you think we should do with that? Only fixing the disasm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it would be ideal if the disasm outputs valid utf-8/utf-16 if possible, or \u{HHHH}
escapes which can be parsed with your python function
Unicode is pretty hard to get right (surrogate pairs, code points > 0xffff, etc).. could also merge the rest of this PR now and do it in another one if you want
Fixed keyed property usage.
Fixed Construct zero-range usage.
Fixing strings encoding.