-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Encoding.LATIN1 returns wrong text with polish letters #558
Comments
"ą" and "ł" actually cannot be encoded in latin-1 / ISO-8859-1, see https://en.wikipedia.org/wiki/ISO/IEC_8859-1 . I don't know what encoding MP3Tag is using there, I could not reproduce the exact outcome. Logical choices with regards to Polish letters would be ISO-8859-2 or on Windows maybe Windows-1250. But these would give:
So a bit different result from yours. But anyway, both are not latin-1. Is there any specific reason you can't use a Unicode encoding for the files? |
I tried decode and encode myself before posting. And I got some errors. I don't know how to load it correctly or fix it. I have few TB database and a lot of files have this problem. MP3tag is getting it correctly. I thought maybe something during tags = ID3(mp3, v2_version=3) is not correct. Or can I fix it somehow later ? Windows mp3 details view also show proper title and album name with polish letters. edit: is this the same problem : #354 ? I found that this is not Latin-1 But windows-1250. But I have no clue how to detect it for rest of the files, because it is only for id3v1 files with latin-1 encoding. How can I check if TIT2 is encoded as Latin-1 ? edit2: maybe this code: But I still think Mutagen could do this better, mp3tag does. |
It's not, id3v2 has a known encoding stored in the file, which is likely wrong in your case.
mutagen currently doesn't second-guess encodings. We could add something to the docs for starters with some examples though. |
Maybe I'm wrong, I'm not so good in coding. Could you help me with this.
after:
tags = ID3(mp3, v2_version=3)
print(tags.getall("TIT2"))
I got this:
[TIT2(encoding=<Encoding.LATIN1: 0>, text=['Uciekaj¹ca ska³a'])]
In mp3tag program I see that everything is fine ( I see polish characters : Uciekająca skała )
It is ID3v2.3(Id3v1 Id3v2.3)
'TPE1': TPE1(encoding=<Encoding.LATIN1: 0>, text=['Roman Felczyñski'] is also broken, I have many files like this, I have no clue how to fix it. Thank you for your help.
Full tags object:
{'TIT2': TIT2(encoding=<Encoding.LATIN1: 0>, text=['Uciekaj¹ca ska³a']), 'PRIV:WM/MediaClassPrimaryID:¼}
Ñ#ãâK\x86¡H¤*(D\x1e': PRIV(owner='WM/MediaClassPrimaryID', data=b'\xbc}
\xd1#\xe3\xe2K\x86\xa1H\xa4*(D\x1e'), 'PRIV:WM/MediaClassSecondaryID:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00': PRIV(owner='WM/MediaClassSecondaryID', data=b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'), 'TCON': TCON(encoding=<Encoding.LATIN1: 0>, text=['Przygodowy']), 'POPM:Windows Media Player 9 Series': POPM(email='Windows Media Player 9 Series', rating=255), 'TPE1': TPE1(encoding=<Encoding.LATIN1: 0>, text=['Roman Felczyñski'])}The text was updated successfully, but these errors were encountered: