Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.4.0 Regresses Certain Characters #206

Closed
wingman-jr-addon opened this issue Aug 9, 2024 · 2 comments
Closed

3.4.0 Regresses Certain Characters #206

wingman-jr-addon opened this issue Aug 9, 2024 · 2 comments

Comments

@wingman-jr-addon
Copy link
Owner

LinkedIn on 3.3.6:
image
LinkedIn on 3.4.0:
image

Note that the dot no longer translates.
This seems similar to #199 but that specific case didn't seem to have regressed.

@wingman-jr-addon
Copy link
Owner Author

Ok, so I think I figured out what triggered this? LinkedIn declares as an HTML 5 document, but does not set character encoding via charset, meta, etc. In this case I believe it is generally the locale plus heuristics that define the use of the encoding, which I believe would usually fall back to iso-8859-1/Windows-1252. However, that fails encoding and causes mojibake. Catching that specific scenario and only temporarily falling back to utf-8 on a per chunk basis looks like it resolves the issue.

Test code is on branch https://github.com/wingman-jr-addon/wingman_jr/tree/fallback-to-utf8

@wingman-jr-addon
Copy link
Owner Author

Fixed somewhat by #207 at least enough for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant