-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a legacy encoding for Azeri #7212
Conversation
CC @whatwg/i18n |
ced doesn't have windows-1254 higher than windows-1252 for the .az TLD, but it's unclear how self-referential those statistics are. That is, could the windows-1252 result arise from ced not having been well trained on Azeri? macOS doesn't support setting Azeri as the primary language, so what Safari does at this time is moot. |
@mfreed7 is this something you can look into for Chromium? Or find someone? 😊 |
Safari looks at the UI language as the spec currently suggests. (But Azeri cannot be set as the kind of language that Safari looks at.) Firefox looks at the TLD. (The UI locale has no effect even for file: URLs.) In #7205, I'm handwavingly connecting ccTLDs to this language list. If you put
in Chrome appears not to be sensitive to the TLD in the ASCII-only case. However, Chrome is sensitive to the TLD as a non-ASCII tie breaker: see https://hsivonen.tr/test/moz/fallback-encoding-non-ascii.htm with the above I guess the practical implementation question for Chrome is whether Chrome is interested in tweaking the the ced TLD entry for .az and the ced language entry for Azeri to give more weight to windows-1254 than windows-1252. Answering that probably depends on how the values for those entries were chosen in the first place. |
The relevant code for Firefox is within chardetng: |
So I generally support improving Azeri support. As for the particulars of what it would take to change ced to accomplish this, I'm not the expert. I'll reach out to our ced experts and ask them to take a look here.
Or, if you're on a browser that supports scroll-to-text-fragment, just click here. |
I finally reached some folks who are familiar with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this!
@hsivonen, I assume this isn't really web-platform-testable? (Maybe it would be if they added fake .az domains, but that's probably a big can of worms.)
@mfreed7 also requested filing a Chromium bug, so if you could do that and link from the OP that'd be appreciated. Probably a WebKit one wouldn't hurt either.
Upon re-reading your earlier comment, I guess WebKit doesn't use TLD and only uses UI language, and doesn't let you set the UI language to Azeri. So a bug would not be useful to them, IIUC. Nevermind. |
(See WHATWG Working Mode: Changes for more details.)
Search for "Azerbaijan" in https://hsivonen.fi/chardetng/ for the story.
/parsing.html ( diff )