Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Song folders with non-ascii characters in their name are ignored without error on Windows #837

Closed
DeinAlptraum opened this issue May 5, 2024 · 14 comments · Fixed by #904

Comments

@DeinAlptraum
Copy link
Contributor

Actual behaviour

Songs with non-ascii characters in their folder name are not picked up by USDX on my Windows install.
E.g. usdb ID 28636 "Die Schlümpfe - Kleine grüne Geister" does not show up in-game for me, and also does not leave an error in the error log. Replacing the 'ü's by 'u's in just the folder name now produces the following error in the log:

ERROR:  Failed to open D:\karaoke\songs-copy\Die Schlumpfe - Kleine grune Geister\Die Schlumpfe - Kleine grune Geister.txt
ERROR:  AnalyseFile failed for "D:\karaoke\songs-copy\Die Schlumpfe - Kleine grune Geister\Die Schlumpfe - Kleine grune Geister.txt".

Note how the paths show even the txt file's names with 'u's in stead of 'ü's - although I haven't renamed these!

In other cases it just displays question marks, i.e. after renaming "Żywiołak - Studzianki" to "Zywiolak - Studzianki" I get:

ERROR:  Failed to open D:\karaoke\songs-checked\Zywiolak - Studzianki\?ywio?ak - Studzianki.txt
ERROR:  AnalyseFile failed for "D:\karaoke\songs-checked\Zywiolak - Studzianki\?ywio?ak - Studzianki.txt".

I also tried to let the directory traversal step print all the files it finds, here is part of the results:

scr

"The Fox" and "Teenage Dirtbag" show what it looks like when it works, i.e. displaying the folder itself, but also all its contents. For the others, it just displays the folder, i.e. it doesn't find any files inside (before renaming the folders). Also notice how the Wolfgang Petry song is apprently somehow transliterated - the actual folder name is "Wir sind die Größten", not "Wir sind die Grosten".

If I rename both the folders and the files within to remove any non-ascii characters, they show up in-game and can be played without any problems.

This doesn't seem to affect everyone on Windows, other people have reported being completely fine. I can also play the exact same files (shared folder on a dual boot) without trouble on my Linux install.

Expected behaviour

The songs can be played without any folder/file renaming, and even if they can't, an error can be found in the log.

Steps to reproduce

Install USDX, then put any song with non-ascii characters in its name into the song folder. Start the game, see that the song doesn't show up.

Details

Provide some additional information:

  • USDX version: 2024.3
  • Operating System + version: Windows 11
@bohning
Copy link
Collaborator

bohning commented May 5, 2024

I can only add that the problem doesn't occur on Mac either, so it's really Windows only.

@RumovZ
Copy link

RumovZ commented Jun 22, 2024

I can confirm this. I've tested with Let 3 - Mama ŠČ (ESC 2023 Croatia) on 2024.5, Windows 10.

@barbeque-squared
Copy link
Member

I cannot reproduce it the way @DeinAlptraum describes it.
I can reproduce it with what @RumovZ commented. Specifically (at least on my system) it's the Č character that's causing the issue. I can also confirm that it does not cause an issue on Linux with exactly that same character.

The weird Š, eszet, o-umlaut, various dashes, none of those cause any issues for me, it's that C that's triggering it for me.
You can just take any working txt (+accompanying files), add a Č somewhere in the txt filename, and tada it gives an error on AnalyseFile (I assume because it can't find the file).

NB: it's possible this character doesn't trigger it on German systems, but should hopefully fail more consistent on English systems.

What I find really weird is that that S and C are in the same Unicode block, but it's totally fine with the S but not the C.

@bohning
Copy link
Collaborator

bohning commented Jul 13, 2024

@s09bQ5 If you are on Windows, could you maybe look into this issue?

@s09bQ5
Copy link
Collaborator

s09bQ5 commented Jul 14, 2024

No Windows here.

@qamil95
Copy link

qamil95 commented Sep 3, 2024

I have exactly the same issue, with song "O-Zone - Dragostea din teï" and the last letter in name
The problem is with file and folder name, no matter which one, if ï letter appears in any of these names, song is not displayed. Language Windows is Polish

@DeinAlptraum
Copy link
Contributor Author

DeinAlptraum commented Sep 14, 2024

I just git bisected this down to ecca52e, this seems to be the first broken commit.

Edit: the bug disappears when re-adding the LazUTF8 unit in UPath.pas

@barbeque-squared
Copy link
Member

Edit: the bug disappears when re-adding the LazUTF8 unit in UPath.pas

This is a good find! However, LazUTF8 is not available (at least not last I checked) when the msys/make method is used on Windows. And aren't special characters working properly in a ton of other places like lyrics or from the ini translations?

@DeinAlptraum
Copy link
Contributor Author

aren't special characters working properly in a ton of other places like lyrics or from the ini translations?

Didn't test this explicitly, but judging from how I haven't heard of any problems there before, I would say yes. I guess it's just the SysUtils filesystem functions that can't deal with this properly...

I tried looking up what exactly LazUTF8 does, but there doesn't seem to exist any proper documentation for the module at all, and the fact that it seems to do something already when just included makes this sort of intractable

@barbeque-squared
Copy link
Member

Ideally we can find a solution that doesn't require Laz. I gave up on the Lazarus editor about two seconds after opening it, and I couldn't get any of the Laz stuff to work through the MSYS method anyway. It's entirely possible that in my commit I'm just calling the wrong conversion.

I am however okay with just copy-pasting the required parts of Laz somewhere in USDX. Unfortunately, I have no idea which parts that are.

@s09bQ5
Copy link
Collaborator

s09bQ5 commented Sep 20, 2024

Unfortunately, I have no idea which parts that are.

Build once with and once without that unit and then compare the list of undefined symbols in the object files?

@bohning
Copy link
Collaborator

bohning commented Sep 20, 2024

Not knowing enough about LazUTF8 and filename encoding on Windows, I tried to ask ChatGPT of what's going on and how it could be resolved (using code snippets from the changes in UFilesystem.pas):

The issue you're encountering is related to handling non-ASCII file names and directories across platforms, particularly with the change from FileExistsUTF8() (from LazUTF8Classes) to SysUtils.FileExists().

In the original version, FileExistsUTF8() was being used, which is UTF-8 aware and correctly handles non-ASCII file names. When you switched to SysUtils.FileExists(), it only works with the system's default encoding (which may not be UTF-8, especially on Windows), leading to issues with non-ASCII characters.

To fix this issue while keeping it cross-platform and avoiding Lazarus-specific dependencies, you need to ensure that the file paths are converted and handled using Unicode-aware functions. Here’s how you can resolve this:

  • Use Unicode-aware file operations: On Windows, you should use the WideString versions of the file operations (FileExistsW, DeleteFileW, GetFileAttributesW, etc.) to properly handle non-ASCII paths. These functions are available in the Windows unit.
  • Ensure proper conversion to WideString: The ToWide() method that you're using should return a WideString that correctly handles UTF-16 encoded strings, which are required for Windows API functions.
  • Handle cross-platform support: For non-Windows platforms, continue using the regular SysUtils.FileExists(), which should work with UTF-8 paths on platforms like Linux and macOS.

Here’s an updated version of your code that should address the issue:

uses
  Classes,
  SysUtils,
  {$IFDEF MSWINDOWS}
  Windows,
  {$ENDIF}
  UPath;

function TFileSystemImpl.FileExists(const Name: IPath): boolean;
begin
  {$IFDEF MSWINDOWS}
  // Use WideString version on Windows to handle non-ASCII paths
  Result := Windows.FileExistsW(PWideChar(Name.ToWide()));
  {$ELSE}
  // On other platforms, use SysUtils.FileExists which handles UTF-8
  Result := SysUtils.FileExists(Name.ToUTF8());
  {$ENDIF}
end;

function TFileSystemImpl.FileGetAttr(const FileName: IPath): Cardinal;
begin
  {$IFDEF MSWINDOWS}
  // Use WideString version on Windows
  Result := Windows.GetFileAttributesW(PWideChar(FileName.ToWide()));
  {$ELSE}
  // On other platforms, use SysUtils version
  Result := FileGetAttrUTF8(FileName.ToUTF8());
  {$ENDIF}
end;

Key Points:

  • Windows-Specific Handling: On Windows, the code now uses FileExistsW() and GetFileAttributesW() to handle WideString paths, ensuring that non-ASCII paths are correctly processed.
  • Cross-Platform Compatibility: On non-Windows platforms, it falls back to using SysUtils.FileExists() and FileGetAttrUTF8(), which work with UTF-8.
    This should resolve your issues with non-ASCII directories and files while keeping the code cross-platform and free from Lazarus-specific dependencies.

@ferby09
Copy link

ferby09 commented Sep 30, 2024

#709 (comment)
This worked for me

@barbeque-squared
Copy link
Member

After an extensive "print a lot of things" session I eventually found that in Line 45 of UFilesystem.pas
https://github.com/UltraStar-Deluxe/USDX/blob/master/src/base/UFilesystem.pas#L45
if you replace
TSytemSearchRec = TSearchRec;
with
TSytemSearchRec = TUnicodeSearchRec;

then on Windows a lot of things suddenly start working. still doesn't print the special characters to the console (didn't check the logfile) but it's at least loading files. Thank you @DeinAlptraum for confirming this.

If nobody beats me to it, I'll figure out tomorrow if Linux also eats this (and if not we just type-guard it) and open a PR. Long-term we should probably start getting rid of some of the RawByteString stuff, but unless some of these functions are now unused the priority is on getting a release with this fix out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants