Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple fixes #10

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Multiple fixes #10

wants to merge 9 commits into from

Conversation

lab313ru
Copy link

@lab313ru lab313ru commented Dec 12, 2024

Fixed keyed property usage.
Fixed Construct zero-range usage.
Fixing strings encoding.

Fixed keyed property usage
Fixed Construct with r0-r0 range usage.
String encoding fixer.
Fixed strings encoding output.
@lab313ru lab313ru changed the title Update translate_table.py Multiple fixes Dec 12, 2024
Copy link
Owner

@j4k0xb j4k0xb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for all these fixes!
I left a few review comments, would be nice if we can figure out how to solve them

Copy link
Owner

@j4k0xb j4k0xb Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also tried something similar but I think this needs a v8 patch

https://github.com/v8/v8/blob/0e75d85d8e3467a536bca01d89d8a180a8bcffca/src/objects/string.cc#L725-L726
The issue is that v8 passes an utf-16 number to \\x%02x, so the output can be \x00 to \xffff
But when parsing we can't tell where the code point ends and the next one starts (e.g. \x7f34="缒" vs \x7f34="\x7f"+"34")

Copy link
Author

@lab313ru lab313ru Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to JS syntax, there is only one interpretation of \xAB, and only one interpretation of \u{AABB} - so there are no \xAABB and no \u{AB}.

Copy link
Owner

@j4k0xb j4k0xb Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the disassembler output doesn't adhere to js syntax and won't contain \u{AABB}
you can try compiling/decompiling the following script:

x = "\u{345}6"; // length 2
y = "\u{3456}"; // length 1
   62 S> 0x1a1c74e41246 @    0 : 13 00             LdaConstant [0]
   64 E> 0x1a1c74e41248 @    2 : 23 01 00          StaGlobal [1], [0]
   78 S> 0x1a1c74e4124b @    5 : 13 02             LdaConstant [2]
   80 E> 0x1a1c74e4124d @    7 : 23 03 02          StaGlobal [3], [2]
         0x1a1c74e41250 @   10 : 0e                LdaUndefined
   95 S> 0x1a1c74e41251 @   11 : a9                Return
Constant pool (size = 4)
0x1a1c74e41259: [FixedArray] in OldSpace
 - map: 0x06c319d80211 <Map(FIXED_ARRAY_TYPE)>
 - length: 4
           0: 0x1a1c74e41289 <String[2]: u#\x3456>
           1: 0x06c319d838c1 <String[1]: #x>
           2: 0x1a1c74e412a1 <String[1]: u#\x3456>
           3: 0x06c319d838d9 <String[1]: #y>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the disasm. Try to compile an umlaut and any unicode symbol, not its hex/u encoding.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still getting a similar result.. Could you please send an example how it looks for you?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right, There are three possible types of hex encoding:
image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so what do you think we should do with that? Only fixing the disasm?

Copy link
Owner

@j4k0xb j4k0xb Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it would be ideal if the disasm outputs valid utf-8/utf-16 if possible, or \u{HHHH} escapes which can be parsed with your python function

Unicode is pretty hard to get right (surrogate pairs, code points > 0xffff, etc).. could also merge the rest of this PR now and do it in another one if you want

@j4k0xb j4k0xb linked an issue Dec 13, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CreateArrayLiteral translator bug
2 participants