Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite anythingButString in terms of a lookahead/consume any character #61

Open
francisrstokes opened this issue Nov 11, 2022 · 0 comments

Comments

@francisrstokes
Copy link
Owner

Right now anythingButString is implemented in a very non-ideal way (see #58). The plan is to replace the existing function, and potentially add one more.

anythingButString('aeiou') will produce output like:

// non-capturing group, containing a lookahead for exact string, then matching any characters repeatedly for inputString.length
/(?:(?!aeiou).{5})/

This implementation will only work predictably for ascii-type strings, because length actually counts UTF-16 codepoints. The same unicode characters can be encoded in multiple distinct ways due to the fact that UTF-16 is not normalised.

To provide an API that is also able to deal with unicode strings, something like anythingButStringUnicode(inputString, numCharactersToMatch) could be added. In this case, the user would be expected to provide the actual number of characters that should be matched after the lookahead. This is kind of fraught in itself due to normalisation, and the fact that whatever string you'd want to match in place may not match the number of code points anyway.

I imagine that this API would still cause confusion with users, both those looking explicitly to match unicode strings, and those who assume they should use this version of the function because why wouldn't you use unicode? In that case, it may be better to skip it altogether, and allow the user to use the group/assertAhead/anyChar/exactly APIs to build the equivalent manually. Though in that case, it still might be worth adding a anyDataUnit as a low-level API for unicode matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant