Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files with invalid UTF-8 names should be synced #32251

Closed
jnweiger opened this issue Aug 6, 2018 · 7 comments
Closed

Files with invalid UTF-8 names should be synced #32251

jnweiger opened this issue Aug 6, 2018 · 7 comments

Comments

@jnweiger
Copy link
Contributor

jnweiger commented Aug 6, 2018

Filenames should be opaque byte streams for us. Whatever bytes a user puts into owncloud, we should store that and bring it back unchanged.
Interpreting the encoding of a filename is a problem of the representation layer, not of the transport layer. I envision ownCloud to become a reliable transport layer amongst filesystems. Representation should be a best effort business for us only.

Currently the owncloud-client evaluates filenames according to the current locale and rejects files with a name that is invalid in the local encoding.

Better: owncloud-client sends the encoding along with the filename, so that representation layers (e.g. web-interface, other clients on different os) can correctly decide how to represent these names.

Comparison amongst file names is possible by comparing byte streams + encoding. "Understanding" the filename is not needed for this.
That may even be "easiner" than comparing interpreted Unicode Glyphs using a UTF-8 representation.
Names with the same exact sequence of glyphs may have distinct representation in UTF-8. Illustraive examples are in https://en.wikipedia.org/wiki/Precomposed_character.
(In UTF-8, we could define that we compare names either normalized, or as is. But afaik, we haven't specified that yet -- I'd go for 'as is', in the spirit of this enhancement suggestion and because there are multiple ways to normalize https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization -- a can of worms.)

@ownclouders
Copy link
Contributor

GitMate.io thinks the contributor most likely able to help you is @PVince81.

Possibly related issues are #20880 (If a folder has a file/folder with non-UTF-8 in its name, its contents can't be listed in the webinterface), #8217 (File contains invalid characters that can not be synced cross plattform), #21360 (File that were locked cannot still be synced with 8.2.2), #1012 ("AddDefaultCharset utf-8" should be added to main .htaccess file), and #21365 (UTF-8 NFD file name on SMB storage cannot be accessed).

@PVince81
Copy link
Contributor

PVince81 commented Aug 6, 2018

This would also imply that we must get rid of any possible file name normalization we have in place in core and reexamine the reasons we had to put them there in the first place.

More trouble could happen with external storages whenever said storage is mounted with non-UTF-8 encoding.

@jnweiger
Copy link
Contributor Author

jnweiger commented Aug 7, 2018

Oh, did not know we do filename normalization in core. Do you have a pointer where that happens?

@PVince81
Copy link
Contributor

PVince81 commented Aug 7, 2018

@guruz
Copy link
Contributor

guruz commented Aug 10, 2018

The ownCloud macOS desktop client also does a normalization before uploading.
Hm, I think with our focus on UTF-8 on the server we're doing quite well.

@jnweiger
Copy link
Contributor Author

@guruz normalizeUnicode() is only optionally called by normalizePath() when $keepUnicode is false.
Both server and client should not mess with glyphs. As long as one does, both may :-)

@stale
Copy link

stale bot commented Sep 20, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants