Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory listing chokes on non-ASCII chars #223

Closed
nkeim opened this issue Jul 22, 2015 · 8 comments
Closed

Directory listing chokes on non-ASCII chars #223

nkeim opened this issue Jul 22, 2015 · 8 comments

Comments

@nkeim
Copy link

nkeim commented Jul 22, 2015

In a directory listing, a non-ASCII character in a filename (including notebook names) abruptly terminates the listing, making all files below it invisible.

terminated_listing
Steps to reproduce are in the linked gist, though if you are a Mac user you can just type Option-m in the Terminal to put a µ in a filename, then check the Jupyter directory listing. The character in the linked example is "µ" (micro sign), but it appears that any byte outside range(128) will do it.

https://gist.github.com/nkeim/5798b1211d52ed47993b

This has to be run with Python 2; Python 3 insists on ASCII for the filename.

Note that the IPython version is just "3.2.0-dev". @ellisonbg may be able to tell you the precise commit, if that matters.

Background: I ended up with Unicode characters in my notebook names courtesy of IPython 2.x, which seemed to handle them perfectly.

@minrk
Copy link
Member

minrk commented Jul 23, 2015

Hm. This may be a configuration or environment issue. I can list, create, and open unicode filenames without issue on IPython 3.x and 4.x on both Python 2 and 3. I am on OS X 10.10.4 with Pythons from conda.

screen shot 2015-07-22 at 18 22 26

@nkeim
Copy link
Author

nkeim commented Jul 23, 2015

Thanks! The environment for the notebook server spawned by JupyterHub could easily be the problem. It appears that LANG is not set by JupyterHub.

Within the world of the notebook server, the Python 3 interpreter (Ubuntu 14.04.1 LTS /usr/bin/python3) just doesn't like Unicode in filenames. open('µm.txt', 'w').write('hi') raises a UnicodeEncodeError. But when I run the same interpreter in my login environment, there's no problem... until I take away its LANG=en_US.UTF-8.

Strangely I cannot reproduce this with Python 3 from Anaconda on my Mac, even if I deliberately give it LANG=en_US.ASCII.

@takluyver
Copy link
Member

I think that makes sense - it's using the locale encoding to convert filenames to bytes. I think it only comes up on Linux, because OS X and Windows define how filenames are encoded independent of the locale.

Jupyterhub should probably ensure that the single-user server is started with appropriate locale settings so that it treats the filesystem as UTF-8.

The notebook server should fail more gracefully when there's a unicode problem with a filename. I'll have a look at that.

@minrk
Copy link
Member

minrk commented Jul 23, 2015

@nkeim what version of JupyterHub? The Hub passes its LANG, LC env variables to the notebook children by default. Do you have any custom config?

@nkeim
Copy link
Author

nkeim commented Jul 23, 2015

Deferring to @ellisonbg for the JupyterHub version — he maintains it here.

@minrk
Copy link
Member

minrk commented Jul 23, 2015

@nkeim in that case, I know exactly what the problem is. The launch config (in supervisord) is not setting the LANG environment variable. This is in /etc/supervisor/conf.d/jupyterhub.conf The environment line should add:

environment=LANG=en_US.UTF-8,LC_ALL=en_US.UTF-8,...

to ensure that server processes get the right env.

@nkeim
Copy link
Author

nkeim commented Jul 23, 2015

@minrk Great, thanks!

takluyver added a commit to takluyver/notebook that referenced this issue Jul 23, 2015
Part of the fix for jupytergh-223.

If a filename can't be decoded in the current encoding, Python escapes
the undecodable bytes as unpaired surrogates, which JS doesn't like
building a URL from.

This doesn't make the undecodable filename openable, but it stops it
from breaking the listing of other files.

The real fix is to set up the locale encoding correctly so that the
filenames can be decoded.
@takluyver
Copy link
Member

#229 should stop it from breaking the rest of the list in such cases.

@minrk minrk closed this as completed in 788c16d Jul 23, 2015
@minrk minrk added this to the 4.0 milestone Jul 23, 2015
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants