-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define string sorting by "code unit order" #55
Comments
I don't want to add a circular reference here: I commented on url#199 to the effect that 'code unit' requires that the character encoding (probably UTF-16 if working in terms of the DOM) be specified. If we're working in terms of UTF-8 byte strings, code point order would be better. Otherwise |
We should also fix byte sorting for whatwg/fetch#454. |
This should probably do something like https://en.wikipedia.org/wiki/Comparison_sort, but not sure yet. See also advice from @domenic in #74. |
I'm thinking it should be something like this: To sort JavaScript strings perform a comparison sort using the following three-way comparison operation, given A and B:
This will get a little repetitive though to do for byte sequences, JavaScript strings, and strings. |
For the W3C spec IndexedDb, I suggested some text that may come in handy in annotating this:
|
Personally I think we should be able to just define a "less than" operation on two strings. We don't need the 0/1/-1. Mathematically at least only less than is necessary (plus an implicit "equals" concept). I would also define "code unit comparison", and then say that a "sort" operation takes a comparison. I would not tie together the definition of sort and the specific comparison we use. We should probably also define ascending vs. descending sort (and say ascending is the default), although we could wait until we have a definite consumer for that. I like @aphillips's text particularly for pointing out the advantages and caveats. As for the repetition, this kind of thing makes me wonder about our implicit concept of "sequence" that we are using for both strings + byte sequences. It's kind of a list even, but I'm not sure I'd want to go that far in the layering... Still, it would be nice to generalize the sorting to work on all three sequence types. |
Perhaps they can be lists with immutable size/length. Even if you just define "less than", you'd still have most of the steps I outlined above, no? Apart from the first step that is. |
Hmmm, yeah, I guess that's true. |
Note that if we settle #91 on mutable treating these as list-likes seems likely. We already have places that use "append" on strings, although they sometimes get passed a code point and sometimes a string (which would more argue for "extend"/"concat" if we're being picky). |
(CSS Typed OM uses "code point order" for some things.) |
Background: whatwg/url#199
We want to update specs to be unambiguous about this, so I think infra is a good place to define it. It should include some examples (similar to the URLSearchParams WPTs).
The text was updated successfully, but these errors were encountered: