-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "replacement" as a label for the replacement encoding #70
Comments
/cc @cdumez who I noticed recently updated WebKit's encoding labels in WebKit/WebKit@f203fd0 I assume it would be a bad option to rename "replacement" to something like, say, "csiso2022kr"? |
@domenic Despite having done the update in WebKit recently, I am actually not familiar with this area enough to comment. I know that we do have the "replacement" label for the replacement encoding in WebKit though. |
Are you sure? Testing indicates otherwise. |
That would be pretty confusing, so I think that's a bad option. |
@hsivonen: I wrote the patch very recently so you would need to try a WebKit nightly build. |
@cdumez someone did try a nightly. It appears that WebKit (per spec and other browsers) does not alias "replacement" to the behavior of the replacement encoding. |
Oh my bad, there was some last minute review feedback on my patch and it appears it killed the replacement alias. Sorry about the bad information. |
We added replacement in https://www.w3.org/Bugs/Public/show_bug.cgi?id=21057. I don't think we really discussed adding it as a label before. I'm certainly open to adding it as it does simplify the system a little bit, but also not a whole lot. @jungshik thoughts? |
FWIW, I concur with the OP - frequent special cases in the code and tests, but sad about adding it to the web just to make implementations cleaner. So no vote either way. |
I'm going to close this since everyone is on the fence. Feel free to reopen though if you feel strongly since it doesn't seem like it would be a hard sell. |
I'm going to reopen this to add "replacement" as a label as the special cases apparently continue to cause problems (at least in Gecko while integrating encoding-rs) for no real gain. As nobody objected and @hsivonen now favors this approach I hope that is acceptable, but I'll leave some time for feedback just in case. |
I'm surprised this is such a big deal in code as there's nothing in the standard (or any standards that use this standard) to my knowledge that trips over this. Nevertheless, I created the PR, review appreciated. Note that before landing it I should probably:
|
Shouldn't we get another implementation interested before landing? |
I considered @inexorabletash's reply above as such, but happy to wait for something more explicit. |
It involves deleting code on our side so I'm okay with the change. |
FWIW, I put up a Blink change: https://chromium-review.googlesource.com/c/559973/ - I'll wait for test updates to hit WPT and roll into Blink, though. Sanity check: we would now expect an HTML file with |
@inexorabletash yeah, that wouldn't be any different from it saying iso-2022-kr or some such. Note that it would have to appear within the first 1024 bytes. @hsivonen do you want to review the change? |
Bugs:
To my knowledge Edge hasn't made an effort yet so I'm not including them for now. |
https://developer.microsoft.com/en-us/microsoft-edge/platform/status/encodingstandard/ lists them as "In Development" so they might appreciate a bug. |
FYI, Blink change landed. Worth noting: @annevk's WPT changes didn't trip any failures when rolled into Blink's CI since the tests currently exercise the encoding labels via the API, and replacement encodings already threw just like unknown labels. We've got blink-specific tests that use XHR and data: URLs to verify various encodings and specifically that the replacement ones yield U+FFFD. I was lazy and just added "replacement" to the list. We should probably tidy up and upstream those (among other fun cases like UTF-7). |
The 'replacement' encoding originated as a spec concept to prevent security attacks via problematic encodings by recognizing the label but not decoding the stream. It was initially specified as the only encoding where the name wasn't one of the labels, requiring special cases in all implementations. Based on more implementer feedback we'd like to remove the special case. Delete the special case code in Blink too. See also: whatwg/encoding#70 Bug: 744405 Change-Id: Ia15ccef1a9d7f35c23af4509a5a9758cbefc2087 Reviewed-on: https://chromium-review.googlesource.com/559973 Reviewed-by: Kent Tamura <tkent@chromium.org> Commit-Queue: Joshua Bell <jsbell@chromium.org> Cr-Commit-Position: refs/heads/master@{#487288}
@inexorabletash yeah, that would be great. Filed https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/12808940/ against Edge. |
Implements whatwg/encoding#70. Closes #22.
I've written "The name of an encoding is also one of its labels, except in the case of the replacement encoding whose name is not one of its labels." inconveniently many times when either documenting code that implements the Encoding Standard or when otherwise explaining the concepts.
Also, I've written code that does something like "if input string is 'replacement', don't run get an encoding, otherwise, run get an encoding" when working with interfaces that were designed (four links there, but GitHub's styling makes it unobvious) before there was clarity of what strings are labels and what strings are names used where an enum value or a reference to a singleton object representing an encoding would be more appropriate software design (when those interfaces potentially have callers in add-ons that I can't fix).
All this would become simpler if get an encoding for the name of an encoding always returned the encoding itself. However, it's kinda sad to expose another Web-exposed label just to make implementing and explaining stuff easier, so I'm not sure if I should request this.
But at least this deserves some discussion.
The text was updated successfully, but these errors were encountered: