-
Notifications
You must be signed in to change notification settings - Fork 35
Fuzz.string only generates ascii strings #201
Comments
I would suggest having a Fuzz.string and fuzz.utf8String or the like. Imagine that you have a problem where somehow the Hebrew string חה were to become הח if you are not familiar with Hebrew that could be very confusing to debug, as you have 2 letters that look pretty similar swapping in position. If we are going to do UTF8/UTF16 we want to make sure we do it really well I assume similar problems could happen with a number of scripts but I happen to have a Hebrew keyboard handy. |
I would like the default to be unicode, so In javascript, the main thing to worry about is characters that don't fit inside a single utf-16 code unit, such as emoji, as well as combining characters (and maybe normalization for equality testing). I think ascii, emoji and some european characters should be enough, without being too hard to debug. |
Sounds good, we probably will eventually want a way to specify character set, so if someone wants Hebrew/Greek/Arabic/Russian/Hindi etc they will be able to have them |
I don't think we should let the user specify what character classes or character sets to use. That's one huge rabbit hole which could take tens of thousands of lines of code to implement in pure Elm. There are ranges of code points that can be used to select a code plane, but if you want whitespace, you'll have to manually list out the 8 different characters, and if you want mathematical characters, there's another set of ranges to use, and so on. For example, here are the code points of the Swedish alphabet: https://www.iana.org/domains/idn-tables/tables/se_sv-se_1.0.html Since this is only for testing, I say we try to pick a subset which is easy to use when testing, but which covers "all" the special cases of unicode. |
Since it sounds like |
As mentioned in #198, and #200, the Fuzz.string fuzzer only generates ascii characters in the range 32-126, which covers
A-Za-z0-9
, some whitespace and some special characters. It should generate any kind of string to make sure the code works with more characters. Even English-only users are impacted, as emoji aren't in ascii 😿.I think we should do a breaking change and make Fuzz.string generate characters from all of unicode. This will probably fail some test suites that previously only tested ascii strings, but that's a good thing, right?
The full unicode solution is however blocked while we wait for a new release of elm-lang/core. The bug has been fixed, but it's not released yet.
The text was updated successfully, but these errors were encountered: