-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transliterate Song artist and title to ASCII for search #795
Transliterate Song artist and title to ASCII for search #795
Conversation
And as you can see in the failed CI build, no iconvenc on Windows. |
Do you think this is a problem? The plan is to also transliterate the search query, so you should be able to find Mötley Crüe by actually entering Mötley etc. While this isn't optimal as you can't pinpoint actual 'ö' etc. anymorw by entering them, I think it's worth the overall improvement.
I will add the iconv.dll for Windows later |
The result of this conversion also depends on the system settings. With LANG=de_DE.utf8 I get ö -> oe, but with LANG=en_US.utf8 I get ö -> o. Please build the iconv.dll in a way that it does not depend on any other DLLs except maybe those provided by Windows (msvcrt.dll). And it is not just the DLL that is missing. You need to provide the Free Pascal unit. |
Oh, I didn't know. I guess that might not be a bad thing though, and as long as both Title/Artist and search query use the same conversion... will see if we can also force the locale setting through the Pascal interface
Sorry, in more detail: I plan to change to |
Turns out Also, I can't tell why AppVeyor failed... or rather, didn't even try, on the last commit... |
a4d228c
to
03877cd
Compare
I added my own build of the DLL including debug data and license text, squashed your commits and made three tiny fixes ( Btw., GNU libiconv transliterates Mötley Crüe to M"otley Cr"ue regardless of the system's language settings. People searching for Motley Crue will never find the band's songs. |
Is https://github.com/anyascii/anyascii maybe better? |
I don't see any difference for that first one? Pushing your changes though.
Looks closer to what we want, but doesn't have a Pascal implementation, right? I don't really feel like getting into this with my current knowledge of Pascal, but if you really prefer this I'll take a look. Otherwise I'd rather go with what we have now, except...
Ugh, that sucks. I looked for ways to pass a locale setting to iconv, but couldn't find one. Windows does have a |
615f252
to
1c4e701
Compare
Addendum:
Just found
And that does not cover everything, e.g. "Pokémon" is transliterated to "pok'emon"... so can't really go with this unless we're fine with purging all quotation marks, apostrophes etc. from titles, artists, search queries etc. |
You added iconvenc a second time at the end of the list of used units so that it was also requested on Windows.
I wrote a small program to convert the arrays of anyascii.c to pascal and converted the tiny anyascii function by hand. There is also ICU for transliteration and I expect it to do a better job than anyascii, but the library is huge. Anyascii adds "only" half a megabyte. |
Then why does the force push have no diff on that file? ^^
Oh that's cool! Should I close this PR then, or how would you like to continue? |
@s09bQ5 What's your suggestion on how to proceed here? |
I added anyascii to a branch in my repo: s09bQ5/USDX@de03e83. It is not yet used, just compiled and linked. @DeinAlptraum, if you want, you can add your work on top of that. |
Will take a look, but I am very busy this month so not sure when I'll get around to that. |
6e405fc
to
e93ab4e
Compare
Changed from iconv to use your AnyASCII bindings instead @s09bQ5, and that seems to work like a charm as far as I've tested. Thanks for your work! Just one minor thing: compiling anyascii takes more than 3 minutes on my laptop on Windows. Linux takes only a few seconds, and the CI also doesn't seem to take significantly longer than before, but just FYI... |
Looks good!
Can you check if that is still the case when converting the |
Changed it to CRLF, but that didn't make a (notable) difference. |
I won't have time to check on the Windows compile process (I use the |
I also just tested compiling via MSYS2 on Windows, which takes about half as long... so "only" about 90 seconds for AnyASCII |
I was finally able to look at this. If I apply the patch I get the following output:
I suspect these correspond to As for the compilation time issue: it only takes a few seconds on my Windows computer, and it only compiles it once anyway. It's probably some compiler optimization thing somewhere? |
@barbeque-squared okay, that's great! |
Transliterate all song text fields for search (Title, Artist, Edition etc.) as well as search queries themselves.
This allows matching e.g. 'Ż' by typing 'z', 'ł' with 'l' or the typographers apostrophe
’
with the straight one'
.The one downside is that we break everything down into ASCII, if you e.g. search for 'ö' you will see all results that match the normal 'o'.
The transliteration uses the GNU utility
iconv
(iconv -f UTF-8 -t ASCII//TRANSLIT
) for transliteration, I've added theiconvenv
unit to the repository with some modifications to make it compatible with Windows, as well as theiconv.dll
for Windows.