Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Qt4] Files and directories with UTF-8 special characters in the name not read correctly #61

Closed
slodki opened this issue May 31, 2017 · 16 comments

Comments

@slodki
Copy link
Contributor

slodki commented May 31, 2017

version: 1.3
system: Ubuntu zesty
package: 1.3-1~zesty (amd64)

$ ls -lR
.:
razem 112
-rw-rw-r-- 1 slodki slodki 51200 maj 31 19:51 file3_ą
-rw-rw-r-- 1 slodki slodki 51200 maj 31 19:51 file4
drwxrwxr-x 2 slodki slodki  4096 maj 31 19:51 krowa
drwxrwxr-x 2 slodki slodki  4096 maj 31 19:51 żółw

./krowa:
razem 52
-rw-rw-r-- 1 slodki slodki 51200 maj 31 19:51 file1

./żółw:
razem 52
-rw-rw-r-- 1 slodki slodki 51200 maj 31 19:51 file2
$ locale
LANG=pl_PL.UTF-8
LANGUAGE=
LC_CTYPE="pl_PL.UTF-8"
LC_NUMERIC="pl_PL.UTF-8"
LC_TIME="pl_PL.UTF-8"
LC_COLLATE="pl_PL.UTF-8"
LC_MONETARY="pl_PL.UTF-8"
LC_MESSAGES="pl_PL.UTF-8"
LC_PAPER="pl_PL.UTF-8"
LC_NAME="pl_PL.UTF-8"
LC_ADDRESS="pl_PL.UTF-8"
LC_TELEPHONE="pl_PL.UTF-8"
LC_MEASUREMENT="pl_PL.UTF-8"
LC_IDENTIFICATION="pl_PL.UTF-8"
LC_ALL=

All files and directories with polish (or Cyrillic) characters are skipped/ignored in QDirStat:
qdirstat-utf8

Works without problems in all other KDE/QT/CLI apps:
dolphin-utf8

@slodki
Copy link
Contributor Author

slodki commented May 31, 2017

Log:

2017-05-31 20:16:25.019 [25062] <Info>    Logger.cpp:138 openLogFile():  -- Log Start --
2017-05-31 20:16:25.062 [25062] <Debug>   TreemapView.cpp:46 TreemapView():  
2017-05-31 20:16:25.082 [25062] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by NameCol ascending
2017-05-31 20:16:25.082 [25062] <Debug>   TreemapView.cpp:117 setSelectionModel():  
2017-05-31 20:16:25.083 [25062] <Info>    Cleanup.cpp:415 desktopSpecificApps():  Detected desktop "KDE"
2017-05-31 20:16:25.083 [25062] <Info>    Cleanup.cpp:468 desktopSpecificApps():  %filemanager => "konqueror --profile filemanagement"
2017-05-31 20:16:25.083 [25062] <Info>    Cleanup.cpp:468 desktopSpecificApps():  %terminal => "konsole --workdir %d"
2017-05-31 20:16:25.084 [25062] <Debug>   DebugHelpers.cpp:133 dumpExcludeRules():  <ExcludeRule ".snapshot">
2017-05-31 20:16:25.084 [25062] <Info>    MainWindow.cpp:792 toggleVerboseSelection():  Verbose selection is now off. Change this with Shift-F7.
2017-05-31 20:16:25.099 [25062] <Info>    DirTree.cpp:95 startReading():     url: "/tmp/aaa"
2017-05-31 20:16:25.099 [25062] <Info>    DirTree.cpp:98 startReading():  device: /dev/mapper/ssd-root
2017-05-31 20:16:25.099 [25062] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by NameCol ascending
2017-05-31 20:16:25.100 [25062] <Debug>   DirReadJob.cpp:333 stat():  url: "/tmp/aaa"
2017-05-31 20:16:25.100 [25062] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:16:25.110 [25062] <WARNING> DirReadJob.cpp:281 startReading():  lstat(/tmp/aaa/file3_Ä) failed: Nie ma takiego pliku ani katalogu                                                                                                           
2017-05-31 20:16:25.110 [25062] <WARNING> DirReadJob.cpp:281 startReading():  lstat(/tmp/aaa/żóÅw) failed: Nie ma takiego pliku ani katalogu                                                                                                            
2017-05-31 20:16:25.135 [25062] <WARNING> [Qt] QFileSystemWatcher: failed to add paths: /home/slodki/.config/ibus/bus
2017-05-31 20:16:25.135 [25062] <WARNING> [Qt] Bus::open: Can not get ibus-daemon's address. 
2017-05-31 20:16:25.135 [25062] <Verbose> [Qt] IBusInputContext::createInputContext: no connection to ibus-daemon 
2017-05-31 20:16:25.147 [25062] <Info>    MainWindow.cpp:456 readingFinished():  
2017-05-31 20:16:25.151 [25062] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by PercentNumCol descending
2017-05-31 20:16:25.151 [25062] <Debug>   MainWindow.cpp:439 idleDisplay():  No current branch - expanding tree to level 1
2017-05-31 20:16:25.151 [25062] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:16:25.300 [25062] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:16:41.719 [25062] <Debug>   TreemapView.cpp:173 writeSettings():  
2017-05-31 20:16:41.734 [25062] <Info>    Logger.cpp:79 ~Logger():  -- Log End --

@slodki
Copy link
Contributor Author

slodki commented May 31, 2017

There is fromUtf8() missing maybe? Or fromLocal8Bit() can be better (using locale)?

Using QFile::encodeName is another approach.

@shundhammer
Copy link
Owner

No, it's there:

https://github.com/shundhammer/qdirstat/blob/master/src/DirReadJob.cpp#L179

What filesystem type is that? Are there any special mount options?

@shundhammer shundhammer changed the title UTF-8 not supported in file/dir names UTF-8 characters not displayed correctly in file/dir names May 31, 2017
@shundhammer
Copy link
Owner

See also issue #19:

x

It works in general, for both files and directories with UTF-8 special characters. Something must be different in your setup.

@shundhammer
Copy link
Owner

qdirstat-utf9

@slodki
Copy link
Contributor Author

slodki commented May 31, 2017

You use toUtf8() on input parameters to libc functions, but not fromUtf8() while storing them into QString.

Problem doesn't depends on filesystem type - all (ext4, cifs, nfs) mounted with utf8 support and working with Qt/CLI apps.

BTW: when dirname with UTF-8 chars is given as starting parameter this name is working correctly: logfile for qdirstat żółw:

2017-05-31 20:49:06.165 [25576] <Info>    Logger.cpp:138 openLogFile():  -- Log Start --
2017-05-31 20:49:06.212 [25576] <Debug>   TreemapView.cpp:46 TreemapView():  
2017-05-31 20:49:06.237 [25576] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by NameCol ascending
2017-05-31 20:49:06.237 [25576] <Debug>   TreemapView.cpp:117 setSelectionModel():  
2017-05-31 20:49:06.238 [25576] <Info>    Cleanup.cpp:415 desktopSpecificApps():  Detected desktop "KDE"
2017-05-31 20:49:06.238 [25576] <Info>    Cleanup.cpp:468 desktopSpecificApps():  %filemanager => "konqueror --profile filemanagement"
2017-05-31 20:49:06.238 [25576] <Info>    Cleanup.cpp:468 desktopSpecificApps():  %terminal => "konsole --workdir %d"
2017-05-31 20:49:06.239 [25576] <Debug>   DebugHelpers.cpp:133 dumpExcludeRules():  <ExcludeRule ".snapshot">
2017-05-31 20:49:06.239 [25576] <Info>    MainWindow.cpp:792 toggleVerboseSelection():  Verbose selection is now off. Change this with Shift-F7.
2017-05-31 20:49:06.255 [25576] <Info>    DirTree.cpp:95 startReading():     url: "/tmp/aaa/żółw"
2017-05-31 20:49:06.255 [25576] <Info>    DirTree.cpp:98 startReading():  device: /dev/mapper/ssd-root
2017-05-31 20:49:06.255 [25576] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by NameCol ascending
2017-05-31 20:49:06.256 [25576] <Debug>   DirReadJob.cpp:333 stat():  url: "/tmp/aaa/żółw"
2017-05-31 20:49:06.256 [25576] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:49:06.271 [25576] <WARNING> [Qt] QFileSystemWatcher: failed to add paths: /home/slodki/.config/ibus/bus
2017-05-31 20:49:06.271 [25576] <WARNING> [Qt] Bus::open: Can not get ibus-daemon's address. 
2017-05-31 20:49:06.271 [25576] <Verbose> [Qt] IBusInputContext::createInputContext: no connection to ibus-daemon 
2017-05-31 20:49:06.271 [25576] <Info>    MainWindow.cpp:456 readingFinished():  
2017-05-31 20:49:06.275 [25576] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by PercentNumCol descending
2017-05-31 20:49:06.275 [25576] <Debug>   MainWindow.cpp:439 idleDisplay():  No current branch - expanding tree to level 1
2017-05-31 20:49:06.275 [25576] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:49:06.456 [25576] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:49:27.469 [25576] <Debug>   TreemapView.cpp:173 writeSettings():  
2017-05-31 20:49:27.473 [25576] <Info>    Logger.cpp:79 ~Logger():  -- Log End --

@slodki
Copy link
Contributor Author

slodki commented May 31, 2017

The issue is about skipping/ignoring files and dirs, not displaying them. As you can see dir with wrong chars is treated as file, all such files are reported as empty (0 bytes).

@slodki
Copy link
Contributor Author

slodki commented May 31, 2017

The polish letter ą is supported by my config using UTF-8 locale:

$ ls -l $'/tmp/aaa/file3_\u0105'
-rw-rw-r-- 1 slodki slodki 51200 maj 31 19:51 /tmp/aaa/file3_ą

strace ls -l $'/tmp/aaa/file3_\u0105' |& grep lstat.*file3
lstat("/tmp/aaa/file3_\304\205", {st_mode=S_IFREG|0664, st_size=51200, ...}) = 0

As you can see all works when use 2-byte encoding with 0xC4 0x85.

But QDirStat is trying to use 4 bytes 0xC3 0x84 0xC2 0x85:

$ strace qdirstat . |& grep lstat.*file3
lstat("/tmp/aaa/file3_\303\204\302\205", 0x7ffcb9021a60) = -1 ENOENT (No such file or directory)

@slodki
Copy link
Contributor Author

slodki commented May 31, 2017

And as you can see 0xC3 0x84 is UTF-8 encoded ASCII char 0xC4.

In my opinion QDirStat is doing something like toUtf8(toUtf8(QString)). And when you don't use toUtf8 (like in command line params) there is no error.

@shundhammer
Copy link
Owner

You are right, of course. Added that missing fromUtf8() call.

One thing I don't understand at all, though, is why it ever worked for me. I also have a UTF-8 environment (de_DE.utf8), and as you can see from the screenshots, not only did it display those files and directories, it also correctly displayed their metadata (size etc.).

OTOH the Qt docs explicitly say that it uses fromUtf8() by default when constructing a QString from a const char *:

http://doc.qt.io/qt-5/qstring.html#initializing-a-string

@shundhammer
Copy link
Owner

Does it work for you with commit e4cf683 ?

@shundhammer
Copy link
Owner

Just a thought: Are you using QDirStat built with Qt 4.x? In Qt 4.x, the QString constructor from const char * used fromAscii() rather than fromUtf8(). That might explain the different behaviour.

I am not 100% sure, but I think NHellFire's PPA builds QDirStat against Qt 4.

@shundhammer
Copy link
Owner

Bingo. I just downloaded and unpacked it, and voila:

[sh @ nazgul] ~/tmp 25 % ldd nhellfire-ppa/usr/bin/qdirstat | grep -i qt
    libQtGui.so.4 => /usr/lib/x86_64-linux-gnu/libQtGui.so.4 (0x00007f1accfc2000)
    libQtCore.so.4 => /usr/lib/x86_64-linux-gnu/libQtCore.so.4 (0x00007f1accadd000)

So if you are using that version of QDirStat, that would perfectly explain the discrepancy between your and my results.

But anyway, explicitly doing the conversion fromUtf8() and not relying on implicit behaviour is definitely the more reliable way.

@shundhammer shundhammer changed the title UTF-8 characters not displayed correctly in file/dir names Files and directories with UTF-8 special characters in the name not read correctly May 31, 2017
@shundhammer shundhammer changed the title Files and directories with UTF-8 special characters in the name not read correctly [Qt4] Files and directories with UTF-8 special characters in the name not read correctly May 31, 2017
@shundhammer
Copy link
Owner

Please reopen if commit e4cf683 did not fix the problem for you.

@slodki
Copy link
Contributor Author

slodki commented Jun 1, 2017

I've tested it with qt4 and qt5 and both work ok now.

@shundhammer
Copy link
Owner

Thanks for confirming this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants