-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change definition of string to remove "Printable" and discourage use … #485
Conversation
…of control characters. Signed-off-by: Tim Bray <timbray@amazon.com>
BTW I'd be happy to change SHOULD NOT to MUST NOT. |
Great. How about TAB? |
Oh yeah, U+0009, you're right. |
This is definitely an improvement, thanks! A previously-ambiguous example is: 🏃♀️, which seems handled pretty clearly. That glyph is:
It's not clear that either ZWJ or VS16 is a "printable character", even though the resulting glyph is printable. So, thanks for writing this up more precisely! Nits incoming:
Nits:
|
Thanks for addressing #483 👍 Assuming we adopt #484 and don't enforce these rules on |
That would certainly be fairly conventional. A variety of protocols exclude certain character classes. I'm particularly in favor of excluding C0 controls for a variety of reasons.
I think we should stay away from encoding and keep our discussion strictly in terms of Unicode characters. Every transport I know of has well-established practices for turning them into bits on the wire and we shouldn't get in the way.
Well, in the data payload you obviously need them. Other than that I can't see why.
I'm in general in favor of excluding trash characters. |
Just a personal preference, but I'd prefer SHOULD NOT. I'm not a fan of being too parental. In general I think people who care about interop will "do the right thing" or people will stop using them. But because there are times when people need to do things that are not best for interop, but are necessary for their case, I like to give people the option to do so. That's why I think SHOULD NOT in this case is better - gives a strong best-practice but also an "out". |
HTTP headers are defined in US-ASCII, so I think the HTTP binary transport needs to define a way to take a unicode string value and map it to an ASCII string: https://tools.ietf.org/html/rfc7230#section-3.2.6 On the other hand, it seems like this is something which is HTTP's fault, and we should make HTTP handle it rather than doing so generically. I put together #488 for this. |
…of control characters. Signed-off-by: Tim Bray <timbray@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, overall LGTM 👍
spec.md
Outdated
meaning and predictably cause interoperability problems. | ||
- `String` - Sequence of allowable Unicode characters. The following characters | ||
are disallowed: | ||
- the "control characters" in the ranges U+0000-U+001F and U+007F-009F (both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either U+0000-U+001F and U+007F-U+009F
or U+0000-001F and U+007F-009F
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to look at the comment for an endless time to spot the fix but I did eventually …
spec.md
Outdated
are disallowed: | ||
- the "control characters" in the ranges U+0000-U+001F and U+007F-009F (both | ||
ranges inclusive), since most have no agreed-on meaning, and some, such as | ||
U+000A (newline), are not usable in contexts such as HTTP headers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation (both lines should start with 4 spaces)
Approved on the 8/22 call. However, @timbray there's a merge conflict - can you resolve that? Then I can merge it. |
replaced by #490 |
…of control characters.
Signed-off-by: Tim Bray timbray@amazon.com