Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pravda.com.ua feed title has weird font/charset glyphs when manually followed #26687

Closed
stephendonner opened this issue Nov 11, 2022 · 4 comments · Fixed by brave/brave-core#15961

Comments

@stephendonner
Copy link

stephendonner commented Nov 11, 2022

Description

pravda.com.ua feed title has weird font/charset glyphs when manually followed

Steps to Reproduce

  1. install 1.47.63
  2. launch Brave
  3. set brave://flags/#brave-news-subscribe-button to Enabled
  4. click Relaunch
  5. opt-in to Brave News on the new-tab page
  6. click Customize
  7. input https://www.pravda.com.ua/rss/
  8. click on Get feeds from...
  9. wait
  10. look at the result
  11. follow the source
  12. open your Brave News feed
  13. look at the article's title

Actual result:

image

Expected result:

Same as the headline on the actual page

original article Brave News feed
Screenshot 2022-11-10 at 15 35 30 Screenshot 2022-11-10 at 15 35 33

Reproduces how often:

100%

Brave version (brave://version info)

Brave 1.47.63 Chromium: 107.0.5304.110 (Official Build) nightly (64-bit)
Revision 2a558545ab7e6fb8177002bf44d4fc1717cb2998-refs/branch-heads/5304@{#1202}
OS Windows 10 Version 22H2 (Build 19045.2251)

Version/Channel Information:

  • Can you reproduce this issue with the current release? N/A
  • Can you reproduce this issue with the beta channel? Yes
  • Can you reproduce this issue with the nightly channel? Yes

cc @mattmcalister @MadhaviSeelam @petemill @fallaciousreasoning

@rebron rebron added priority/P4 Planned work. We expect to get to it "soon". release-notes/exclude QA/Yes labels Nov 11, 2022
@fallaciousreasoning
Copy link

@petemill They use a non-utf8 encoding (charset=windows-1251 ) which I suspect is the root cause of this issue

@fallaciousreasoning
Copy link

Interestingly, this was actually being caused by a double decode. Previously, I had fixed a crash where we were passing non-UTF8 bytes to ::rust::String. I solved this by converting the string to UTF8 first. However, the feed_rs library we're using already decodes the byte data we were passing it using the encoding from the xml header (<?xml version="1.0" encoding="windows-1251"?>).

The solution here, is to not pass a string across the FFI interface, and instead pass a slice of bytes. As a bonus, I think this means we no longer make a copy of the string before passing it across the FFI boundary.

@LaurenWags
Copy link
Member

Requires 1.46.122 or higher to test.

@stephendonner
Copy link
Author

Verified PASSED using

Brave 1.46.122 Chromium: 108.0.5359.48 (Official Build) beta (x86_64)
Revision 18ceeca0d99318e70c00d2e04d88aa55488b5c63-refs/branch-heads/5359@{#854}
OS macOS Version 11.7.1 (Build 20G918)

Steps:

  1. installed 1.46.122
  2. launched Brave
  3. set brave://flags/#brave-news-subscribe-button to Enabled
  4. clicked Relaunch
  5. opted-in to Brave News on the new-tab page
  6. clicked Customize
  7. input https://www.pravda.com.ua/rss/
  8. clicked on Get feeds from...
  9. waited
  10. looked at the result
  11. followed the source
  12. opened my Brave News feed
  13. looked at the article's title

Confirmed articles and data from www.pravda.com.ua are in the correct character set/encoding

example example example
Screen Shot 2022-11-21 at 8 35 01 PM Screen Shot 2022-11-21 at 8 35 54 PM Screen Shot 2022-11-21 at 8 36 00 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants