Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unproper handling of National Characters #359

Open
ArturRuta opened this issue Feb 9, 2025 · 1 comment
Open

Unproper handling of National Characters #359

ArturRuta opened this issue Feb 9, 2025 · 1 comment

Comments

@ArturRuta
Copy link

Note: This originally arised as an issue in wallabag but initial research showed that it's very likely coming from graby which wallabag uses.

The problem is

on oncasion the national characters in an article text are not shown as expected ans instead they are replaced by extrange combinations of garbage characters.

The issue is not arising for all articles, some correctly show the natinoal characters while other doesn't.
The behavior is consistent in the sense that for a guiven article the result is allways good or allways bad.
Curiously enough the article's title is allways properly handled (at least in the samples I tried) while the content may or may not be properly transcribed.

  • One sample of a bad behavior comes from this url: error example
    • You will find that the article tittle, even if containing national characters it's properly handled. For example it contais the work: más
    • On the other side, the article contents are not propoerly handled. Very early in the text you can see for example the word automóvil that is wrong. It should look like automóvil instead.
  • One sample of proper behavior comes from this url: correct sample

When digging into this topic i saw the following:

  • Wallabag stores the article's content with the wrong characters, hence the problem is not how the result is presented. It lays in the way the source is processed.
  • Another application, f43.me which also relies on graby shows the same kind of misbehavior
  • Finally, i managed to enable graby logs in wallabag (se attached content), and while I have a limited understanding it seems that the garbage characters show up there.

I cannot do stantalone tests of graby, that falls well behing my capabilities, but if the issues is found/fixes i would be very glad of testing any wallabag o f43.me releases includig the fix.

By the way, wallabag version is 2.6.10 which is the latest one at the time of this writing, but i cannot found which graby version is packed inside.

Any help will be greatly appreciated
Best regards.

graby.log

@ArturRuta
Copy link
Author

Unfortunately this was a deal breaker for me.
Since no fix was available I had to move to hoarder which hasn't this issue.
Anyway I'm open to help testing/verifying a solution if/whenever is developped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant