-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: [position, codePoint] pairs #1
Comments
This seems like it may potentially be at odds with the efficiency goals — if you don’t want to call
|
Small objects are usually not a problem for modern engines, especially with a known static shape like here.
No, that is not the same. What you're suggesting is just indexing of codepoints as if they are a flat array, and that is already pretty easily achievable with some extra |
Ah, you’re right, I took "position" above to mean position among codepoints, not position by code units. This still seems easy to achieve with an
Though I can see how this is not so ergonomic. |
Yes, that's exactly what we want to avoid. Manual comparisons of code points can already give iteration over entire string without |
Rather than returning an array of the form
|
But that wasn't a concern for all the other These existing precedents are also the reason why I thought |
Array destructuring uses the array iterator, so doing for (const [position, codePoints] of string.fooBar()) {
[position, codePoints]
} Is like doing an implicit Object destructuring doesn’t have that overhead. |
@mathiasbynens I'm sure that the intermediate object and usage of the iteration protocol can be optimised away given a smart enough JS engine. Still, the usability point around confusing ordering and future-proofing concern remains. Additionally, this interface allows a type system to catch more errors. @RReverser I don't think the inconsistency with other |
True, but not having to optimize it away in the first place is infinitely better :) |
@bakkot would be interested in having an |
If a surrogate appears at all in the iterator results, it was either lone or out of order, meaning for (const { position, codePoint } of string.codePointEntries()) {
let isValid = codePoint >= 0xD800 && codePoint < 0xE000;
} Sorry for bringing out the slippery slope argument, but adding this paves the way for including other Unicode properties (ID_Continue, Script, etc.) in the iterator result, which I think is inappropriate. |
@michaelficarra’s comment reflects my personal opinion, as I told @bakkot offline. Initially @bakkot wanted the iterator to throw when a lone code point was reached, which I disagreed with even more strongly. Especially when writing a parser, you’d want to get to the lone surrogate instead of erroring out. |
If we’re going for an object anyway, we could even add the string representation of the code point to the result. |
I was rather referring to inconsistency with all new entries iterators - |
TL;DR - personally I like the object approach for readability / extensibility reasons, but personal preferences aside, I'm seriously concerned about introducing an inconsistent API when so many examples of the other one already exist. |
In the existing "tuple" examples, member 0 is a "key". Since that isn’t quite what position means, I don’t think it’s inconsistent. In fact, usage of Edit: I had highlighted Set as an exception to the "key" case, but I guess it depends on how you look at it. |
I came to this proposal and was immediately surprised by the HOWEVER, I changed my mind upon realizing that (per #1 (comment)) the position here is not an index, but instead a position. Given that, I think the current design is more appropriate. In other words, I'd expect that if I see I would suggest documenting this design choice in the readme for future people like me who come here with similar questions. |
In some cases it would be useful to know the current position within the string as you go through the codepoints (for example, to store it and use later for slicing or error reporting).
To achieve this,
.codePoints()
iterator could yield pairs[position, codePoint]
instead of justcodePoint
.The obvious downside is that this would be inconsistent with the default chars iterator.
On the other hand, with regular chars iterator, this can be done in a more or less obvious manner already (you can sum up
char.length
as you go through the string), and, if not, we can add an extra method to yield[position, char]
pairs too in future if required.Thoughts?
The text was updated successfully, but these errors were encountered: