Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "replacement" as a label for the replacement encoding #70

Closed
hsivonen opened this issue Aug 22, 2016 · 21 comments
Closed

Add "replacement" as a label for the replacement encoding #70

hsivonen opened this issue Aug 22, 2016 · 21 comments

Comments

@hsivonen
Copy link
Member

I've written "The name of an encoding is also one of its labels, except in the case of the replacement encoding whose name is not one of its labels." inconveniently many times when either documenting code that implements the Encoding Standard or when otherwise explaining the concepts.

Also, I've written code that does something like "if input string is 'replacement', don't run get an encoding, otherwise, run get an encoding" when working with interfaces that were designed (four links there, but GitHub's styling makes it unobvious) before there was clarity of what strings are labels and what strings are names used where an enum value or a reference to a singleton object representing an encoding would be more appropriate software design (when those interfaces potentially have callers in add-ons that I can't fix).

All this would become simpler if get an encoding for the name of an encoding always returned the encoding itself. However, it's kinda sad to expose another Web-exposed label just to make implementing and explaining stuff easier, so I'm not sure if I should request this.

But at least this deserves some discussion.

@domenic
Copy link
Member

domenic commented Aug 22, 2016

/cc @cdumez who I noticed recently updated WebKit's encoding labels in WebKit/WebKit@f203fd0

I assume it would be a bad option to rename "replacement" to something like, say, "csiso2022kr"?

@cdumez
Copy link

cdumez commented Aug 22, 2016

@domenic Despite having done the update in WebKit recently, I am actually not familiar with this area enough to comment. I know that we do have the "replacement" label for the replacement encoding in WebKit though.

@hsivonen
Copy link
Member Author

I know that we do have the "replacement" label for the replacement encoding in WebKit though.

Are you sure? Testing indicates otherwise.

@hsivonen
Copy link
Member Author

I assume it would be a bad option to rename "replacement" to something like, say, "csiso2022kr"?

That would be pretty confusing, so I think that's a bad option.

@cdumez
Copy link

cdumez commented Aug 24, 2016

@hsivonen: I wrote the patch very recently so you would need to try a WebKit nightly build.

@annevk
Copy link
Member

annevk commented Aug 24, 2016

@cdumez someone did try a nightly. It appears that WebKit (per spec and other browsers) does not alias "replacement" to the behavior of the replacement encoding.

@cdumez
Copy link

cdumez commented Aug 24, 2016

Oh my bad, there was some last minute review feedback on my patch and it appears it killed the replacement alias. Sorry about the bad information.

@annevk
Copy link
Member

annevk commented Nov 16, 2016

We added replacement in https://www.w3.org/Bugs/Public/show_bug.cgi?id=21057. I don't think we really discussed adding it as a label before. I'm certainly open to adding it as it does simplify the system a little bit, but also not a whole lot.

@jungshik thoughts?

@inexorabletash
Copy link
Member

FWIW, I concur with the OP - frequent special cases in the code and tests, but sad about adding it to the web just to make implementations cleaner. So no vote either way.

@annevk
Copy link
Member

annevk commented Mar 19, 2017

I'm going to close this since everyone is on the fence. Feel free to reopen though if you feel strongly since it doesn't seem like it would be a hard sell.

@annevk annevk closed this as completed Mar 19, 2017
@annevk
Copy link
Member

annevk commented Jun 28, 2017

I'm going to reopen this to add "replacement" as a label as the special cases apparently continue to cause problems (at least in Gecko while integrating encoding-rs) for no real gain.

As nobody objected and @hsivonen now favors this approach I hope that is acceptable, but I'll leave some time for feedback just in case.

@annevk annevk reopened this Jun 28, 2017
@annevk annevk changed the title Consider adding "replacement" as a label for the replacement encoding Add "replacement" as a label for the replacement encoding Jun 28, 2017
@annevk
Copy link
Member

annevk commented Jul 4, 2017

I'm surprised this is such a big deal in code as there's nothing in the standard (or any standards that use this standard) to my knowledge that trips over this.

Nevertheless, I created the PR, review appreciated. Note that before landing it I should probably:

  1. Update tests.
  2. File bugs against Firefox, WebKit, and Chromium. Not sure about Edge as they haven't made an effort to comply thus far I think.

@domenic
Copy link
Member

domenic commented Jul 4, 2017

Shouldn't we get another implementation interested before landing?

@annevk
Copy link
Member

annevk commented Jul 4, 2017

I considered @inexorabletash's reply above as such, but happy to wait for something more explicit.

@inexorabletash
Copy link
Member

It involves deleting code on our side so I'm okay with the change.

@inexorabletash
Copy link
Member

FWIW, I put up a Blink change: https://chromium-review.googlesource.com/c/559973/ - I'll wait for test updates to hit WPT and roll into Blink, though.

Sanity check: we would now expect an HTML file with <meta http-equiv="content-type" content="text/html; charset=replacement"> to render as � yes?

@annevk
Copy link
Member

annevk commented Jul 14, 2017

@inexorabletash yeah, that wouldn't be any different from it saying iso-2022-kr or some such. Note that it would have to appear within the first 1024 bytes.

@hsivonen do you want to review the change?

@annevk
Copy link
Member

annevk commented Jul 17, 2017

@domenic
Copy link
Member

domenic commented Jul 17, 2017

https://developer.microsoft.com/en-us/microsoft-edge/platform/status/encodingstandard/ lists them as "In Development" so they might appreciate a bug.

@inexorabletash
Copy link
Member

inexorabletash commented Jul 18, 2017

FYI, Blink change landed.

Worth noting: @annevk's WPT changes didn't trip any failures when rolled into Blink's CI since the tests currently exercise the encoding labels via the API, and replacement encodings already threw just like unknown labels.

We've got blink-specific tests that use XHR and data: URLs to verify various encodings and specifically that the replacement ones yield U+FFFD. I was lazy and just added "replacement" to the list. We should probably tidy up and upstream those (among other fun cases like UTF-7).

MXEBot pushed a commit to mirror/chromium that referenced this issue Jul 18, 2017
The 'replacement' encoding originated as a spec concept to prevent
security attacks via problematic encodings by recognizing the
label but not decoding the stream.

It was initially specified as the only encoding where the name
wasn't one of the labels, requiring special cases in all implementations.
Based on more implementer feedback we'd like to remove the special
case. Delete the special case code in Blink too.

See also: whatwg/encoding#70

Bug: 744405

Change-Id: Ia15ccef1a9d7f35c23af4509a5a9758cbefc2087
Reviewed-on: https://chromium-review.googlesource.com/559973
Reviewed-by: Kent Tamura <tkent@chromium.org>
Commit-Queue: Joshua Bell <jsbell@chromium.org>
Cr-Commit-Position: refs/heads/master@{#487288}
@annevk
Copy link
Member

annevk commented Jul 18, 2017

@inexorabletash yeah, that would be great.

Filed https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/12808940/ against Edge.

hsivonen added a commit to hsivonen/encoding_rs that referenced this issue Jul 31, 2017
ricea pushed a commit to ricea/encoding that referenced this issue Nov 16, 2017
ricea pushed a commit to ricea/encoding that referenced this issue Nov 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants