Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding aliases not supported when fetching webpages. #2648

Closed
hanubeki opened this issue Jan 9, 2023 · 5 comments
Closed

Encoding aliases not supported when fetching webpages. #2648

hanubeki opened this issue Jan 9, 2023 · 5 comments
Labels
area: http bug Something isn't working

Comments

@hanubeki
Copy link

hanubeki commented Jan 9, 2023

Issue Summary

Follow up for #1858 and #2015

Lemmy detects and decodes encoding when fetching articles, but not completely support encoding aliases such as Shift_JIS.
Example: https://www.itmedia.co.jp/news/spv/2212/16/news161.html

rust-encoding supports Shift_JIS as an alias of Windows-31J.
https://github.com/lifthrasiir/rust-encoding/blob/4e79c35ab6a351881a86dbff565c4db0085cc113/src/codec/japanese.rs#L454

Current implementation only checks for name but not whatwg_name which has shift_jis.

if let Some(encoding_ref) = encodings().iter().find(|e| e.name() == charset) {

So Lemmy doesn't currently decode Japanese pages which uses Shift_JIS.

Steps to Reproduce

  1. Create a post with a page encoded with Shift_JIS
  2. Page title and content are not decoded correctly.

Technical details

  • I don't host any Lemmy instance, I report it as a Lemmy instance user.
@hanubeki hanubeki added the bug Something isn't working label Jan 9, 2023
@dessalines
Copy link
Member

Feel free to make a PR, seems like an easy fix.

@hanubeki
Copy link
Author

I never wrote any Rust code. That's why I opened an issue instead of a PR.

@dessalines
Copy link
Member

Stale issue, can re-open if someone wants to take it on.

@dessalines dessalines closed this as not planned Won't fix, can't repro, duplicate, stale Oct 18, 2023
@Nutomic
Copy link
Member

Nutomic commented Oct 19, 2023

I dont think it makes sense to mark an issue like this as "stale" when it can still be reproduced and still needs to be fixed.

@Nutomic Nutomic reopened this Oct 19, 2023
@Nutomic
Copy link
Member

Nutomic commented Sep 20, 2024

This appears to be fixed, the preview shows Japanese characters.

Screenshot_20240920_121826

@Nutomic Nutomic closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: http bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants