Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf8 filenames (with ulauts in filename) may appear broken inside container with nfs_mount_enabled #2048

Closed
rfay opened this issue Jan 25, 2020 · 7 comments

Comments

@rfay
Copy link
Member

rfay commented Jan 25, 2020

Describe the bug

A user reports that when using nfs_mount_enabled: true on Windows, their filenames that contain ulauts are not shown correctly inside the container, at least by the ls command.

So for example, on Windows or macOS host, in a project,

  • touch Mädchen.txt.
  • ls -l M* shows it correctly on the host
  • ddev ssh and ls -l /var/www/html/M*

ls -l will report '/var/www/html/M'$'\303\244''dchen.txt'
Interestingly, cp M<tab will autocomplete with Mädchen.txt, seeming to mean that ls doesn't handle utf8, but bash (or bash autocomplete) does. (echo M* also shows correct filename, so it's bash reporting it correctly, ls doing it wrong).

However, the filename being shown by ls is the correct byte-by-byte utf8 filename, just poorly shown.

PHP 7.2 and 7.3 also demonstrate correct behavior inside the container, using this script (readfile.php):

<?php

$umlautfile = "Mädchen.txt";
$handle = fopen($umlautfile, "r");
if ($handle == FALSE) {
	print "Failed to open file $umlautfile";
} else {
	print "Successfully opened file $umlautfile";
}
print "\n";

print "Globbing M*\n";
$files = glob("M*");
foreach($files as $file)
   echo $file;
$ php readfile.php
Successfully opened file Mädchen.txt
Globbing M*
Mädchen.txt

Version and configuration information (please complete the following information):

  • Host computer OS and Version: Tested on Windows 10, macOS, Ubuntu 18.04

Additional context

It does seem that nfs v4 might fix this problem completely, and on Linux (Ubuntu 18.04) the problem doesn't show up at all. Stack Overflow article does say that macOS can't do nfs v4; the check rpcinfo -p 127.0.0.1 | grep "nfs" does show only versions 2 and 3, whereas on Ubuntu 18.04 it shows v4 also.

@wizonesolutions
Copy link
Contributor

wizonesolutions commented Feb 29, 2020

I had this problem on a Manjaro host, specifically when PHP needed to deal with files containing UTF-8 characters. I solved it with this .ddev/docker-compose.env.yaml file:

version: '3.6'
services:
  web:
    environment:
      - LANG=en_US.UTF-8

HOWEVER, note that you might need code changes too in order to actually setlocale() properly in Drupal, at least if you are passing filenames to the command line. Look at FillPDF's ShellManager service for some ideas. It it itself based on ImageMagick's helpers.

https://git.drupalcode.org/project/fillpdf/blob/8.x-4.x/src/ShellManager.php

@rfay
Copy link
Member Author

rfay commented Mar 6, 2020

I confirm @wizonesolutions result on macOS with and without NFS and on Windows without NFS:

ddev ssh
export LANG=en_US.UTF-8
$ ls -l M*
-rw-r--r-- 1 rfay rfay 0 Mar  6 00:24 Mädchen.txt

Unfortunately, on Windows with nfs_mount_enabled: true using winnfsd the bad result shows. This is a result of winnfsd's problems. I think a more advanced NFS server would work better.

@Kephson
Copy link

Kephson commented Mar 24, 2020

I'm trying to solve this problem with DDEV on Windows and using TYPO3 as content management system at the moment.
For me it's not working at the moment with DDEV v1.13.1, winnfsd, apache-fpm with PHP 7.
I will feedback if I could find out the problems.

@rfay
Copy link
Member Author

rfay commented Mar 24, 2020

Thanks @Kephson - I don't think you'll get proper utf8 filenames on Windows with winnfsd, but I do expect that one of the more sophisticated commercial nfs servers will work for you. IIRC it also probably works (mostly) without nfs, especially if you set the LANG as in #2048 (comment)

@wizonesolutions
Copy link
Contributor

Note that on Linux, it's actually called en_US.utf8. But I think my above env file actually does work. The issue I was running into was more underlying; I had a custom Dockerfile, and it was defaulting to POSIX during the build process. The build process depended on the locale to work properly, so the wrong locale was "baked into" the result. I fixed that with an ENV LANG en_US.utf8.

@andypost
Copy link

andypost commented Apr 7, 2020

I bet it also depends on libc inside of container, there's great notes about Alpinelinux one https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale

@rfay
Copy link
Member Author

rfay commented Nov 8, 2020

I don't think there's any way for ddev to have any impact on this.

@rfay rfay closed this as completed Nov 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants