-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/proof of concept: Unmangle MAST strings #607
base: main
Are you sure you want to change the base?
Conversation
nqp::setmethcache($buf, nqp::hash('new', method () {nqp::create($buf)})); | ||
$buf; | ||
} | ||
my $handle-mangled-strings := 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a debugging thing? I don't understand the if
just after it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is. It's also so it's easy to see how this proof of concept was achieved when viewing the pull request.
I'm missing the overall goal of this PR? Bytecode files store strings in a couple of different ways, in order to reduce memory use and startup decoding time (because if you know it's just in the latin-1 range, a whole load of things simply cannot happen). |
I don't understand why there's anything to fix. To me this looks like it's removing an optimization. |
@jnthn if there is a good reason for it, then it can stay. But let me try and explain in more detail what I currently understand. Some of this may be incorrect, so feel free to correct/expand on this.
It is not clear to me why we should be storing non-utf8 valid strings. The one that is of concern inside nqp is '$¢'. I am guessing rakudo also goes through this path. It is my opinion we should only be storing/decoding strings as utf8. This would mean If the reason for this optimization is to avoid using the full utf-8 encoder/decoder, I think it would make sense to change this pull request so we use the ASCII decoder/encoder on ASCII strings only. I hope this makes it a bit more clear my intentions here. |
aef8971
to
9aca235
Compare
This is done in multiple steps:
This is incomplete, but is a proof of concept for fixing us having both utf8 and latin-1 encoded lexicals.
5. Disable handling of mangled lexical strings