Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: NC24 + PHP8.1 break UTF-8 compatibility #31212

Closed
5 of 8 tasks
ambraspace opened this issue Feb 16, 2022 · 24 comments
Closed
5 of 8 tasks

[Bug]: NC24 + PHP8.1 break UTF-8 compatibility #31212

ambraspace opened this issue Feb 16, 2022 · 24 comments
Assignees
Labels
1. to develop Accepted and waiting to be taken care of bug
Milestone

Comments

@ambraspace
Copy link

ambraspace commented Feb 16, 2022

⚠️ This issue respects the following points: ⚠️

  • This is a bug, not a question or a configuration/webserver/proxy issue.
  • This issue is not already reported on Github (I've searched it).
  • Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
  • I agree to follow Nextcloud's Code of Conduct.

Bug description

Due to changes introduced in PHP8.1 creating files or directories with non-ascii characters result with problems such as:

  • not displaying file/directory name correctly
  • not being able to edit file
  • not being able to delete file

I'm guessing it's this change from PHP change log:

htmlspecialchars(), htmlentities(), htmlspecialchars_decode(), html_entity_decode(), and get_html_translation_table() now use ENT_QUOTES | ENT_SUBSTITUTE rather than ENT_COMPAT by default. This means that ' is escaped to ' while previously nothing was done. Additionally, malformed UTF-8 will be replaced by a Unicode substitution character, instead of resulting in an empty string.

Actually, not being able to edit or delete files have been problem with Arch patched version of NC23 and PHP8.1. With NC24+PHP8.1 I am able to edit or delete text file, but the file name is not properly displayed.

I also noticed that a few contact names with non-ascii characters in name have not been displayed correctly.

Steps to reproduce

  1. Install NC24 from master branch (I tested with last commit 9026455)
  2. Create admin user and install Plain text editor.
  3. Go to Files and create a directory named "šđčćž". The directory created will be displayed as "Å¡ÄÄÄž".
  4. Create new text file named "šđčćž". An error ("An internal server error occurred") will be displayed, but the file will be created, although it will be displayed as "Å¡ÄÄÄž.txt".

Expected behavior

The file which has non-ascii characters in file name should be displayed as entered by a user (i.e. to fully support UTF-8 character set).

Installation method

Manual installation

Operating system

Other

PHP engine version

PHP 8.1

Web server

Apache (supported)

Database engine version

MariaDB

Is this bug present after an update or on a fresh install?

Fresh Nextcloud Server install

Are you using the Nextcloud Server Encryption module?

No response

What user-backends are you using?

  • Default user-backend (database)
  • LDAP/ Active Directory
  • SSO - SAML
  • Other

Configuration report

{
    "system": {
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "localhost:8080"
        ],
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "mysql",
        "version": "24.0.0.4",
        "overwrite.cli.url": "http:\/\/localhost:8080",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "app_install_overwrite": [
            "files_texteditor"
        ]
    }
}

List of activated Apps

Enabled:
  - accessibility: 1.10.0
  - cloud_federation_api: 1.7.0
  - comments: 1.14.0
  - contactsinteraction: 1.5.0
  - dashboard: 7.4.0
  - dav: 1.22.0
  - federatedfilesharing: 1.14.0
  - federation: 1.14.0
  - files: 1.19.0
  - files_sharing: 1.16.0
  - files_texteditor: 2.14.0
  - files_trashbin: 1.14.0
  - files_versions: 1.17.0
  - lookup_server_connector: 1.12.0
  - oauth2: 1.12.0
  - provisioning_api: 1.14.0
  - settings: 1.6.0
  - sharebymail: 1.14.0
  - systemtags: 1.14.0
  - theming: 1.15.0
  - twofactor_backupcodes: 1.13.0
  - updatenotification: 1.14.0
  - user_status: 1.4.0
  - weather_status: 1.4.0
  - workflowengine: 2.6.0
Disabled:
  - admin_audit
  - bruteforcesettings
  - calendar
  - contacts
  - encryption
  - files_external
  - files_markdown
  - testing
  - user_ldap

Nextcloud Signing status

Integrity checker has been disabled. Integrity cannot be verified.

Nextcloud Logs

{"reqId":"wTxxhMBMjGnH9YtVcnrT","level":2,"time":"2022-02-15T20:59:58+00:00","remoteAddr":"10.0.2.2","user":"--","app":"no app in context","method":"GET","url":"/index.php","message":"Host localhost was not connected to because it violates local access rules","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0","version":""}
{"reqId":"wTxxhMBMjGnH9YtVcnrT","level":2,"time":"2022-02-15T20:59:58+00:00","remoteAddr":"10.0.2.2","user":"--","app":"no app in context","method":"GET","url":"/index.php","message":"Host localhost was not connected to because it violates local access rules","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0","version":""}
{"reqId":"YJdOQ0murWuS3PsCAWUL","level":2,"time":"2022-02-15T21:01:05+00:00","remoteAddr":"10.0.2.2","user":"--","app":"no app in context","method":"POST","url":"/index.php","message":"Host localhost was not connected to because it violates local access rules","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0","version":""}
{"reqId":"YJdOQ0murWuS3PsCAWUL","level":2,"time":"2022-02-15T21:01:05+00:00","remoteAddr":"10.0.2.2","user":"--","app":"no app in context","method":"POST","url":"/index.php","message":"Host localhost was not connected to because it violates local access rules","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0","version":""}
{"reqId":"k5RFMVQsebjj9ibhgVwg","level":2,"time":"2022-02-16T12:26:41+00:00","remoteAddr":"10.0.2.2","user":"--","app":"no app in context","method":"GET","url":"/","message":"Host localhost was not connected to because it violates local access rules","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0","version":""}
{"reqId":"k5RFMVQsebjj9ibhgVwg","level":2,"time":"2022-02-16T12:26:41+00:00","remoteAddr":"10.0.2.2","user":"--","app":"no app in context","method":"GET","url":"/","message":"Host localhost was not connected to because it violates local access rules","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0","version":""}
{"reqId":"10v48tQUEdbrkeM8owOE","level":2,"time":"2022-02-16T12:27:06+00:00","remoteAddr":"10.0.2.2","user":"--","app":"no app in context","method":"POST","url":"/index.php","message":"Host localhost was not connected to because it violates local access rules","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0","version":""}
{"reqId":"10v48tQUEdbrkeM8owOE","level":2,"time":"2022-02-16T12:27:06+00:00","remoteAddr":"10.0.2.2","user":"--","app":"no app in context","method":"POST","url":"/index.php","message":"Host localhost was not connected to because it violates local access rules","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0","version":""}
{"reqId":"j4SGoponDpUMKbocZOqh","level":2,"time":"2022-02-16T12:27:32+00:00","remoteAddr":"10.0.2.2","user":"--","app":"no app in context","method":"POST","url":"/index.php","message":"Host localhost was not connected to because it violates local access rules","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0","version":"24.0.0.4"}
{"reqId":"j4SGoponDpUMKbocZOqh","level":2,"time":"2022-02-16T12:27:32+00:00","remoteAddr":"10.0.2.2","user":"--","app":"no app in context","method":"POST","url":"/index.php","message":"Host localhost was not connected to because it violates local access rules","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Firefox/97.0","version":"24.0.0.4"}

Additional info

No response

@ambraspace ambraspace added 0. Needs triage Pending check for reproducibility or if it fits our roadmap bug labels Feb 16, 2022
@come-nc come-nc self-assigned this Feb 17, 2022
@come-nc come-nc added 1. to develop Accepted and waiting to be taken care of and removed 0. Needs triage Pending check for reproducibility or if it fits our roadmap labels Mar 24, 2022
@come-nc come-nc added this to the Nextcloud 24 milestone Mar 24, 2022
@PVince81
Copy link
Member

there was a similar issue earlier, not sure if related: #29296

@come-nc
Copy link
Contributor

come-nc commented Mar 30, 2022

@come-nc
Copy link
Contributor

come-nc commented Mar 31, 2022

@ambraspace So after looking into it this comes from a change in PHP itself which tried to get better at autodetecting encodings.
In this case the accentuated letter you used making no sense PHP consider it unlikely to be UTF-8.
Did you encounter this in a reallife usecase with a file name or was it only from testing with gibberish?

We are looking into not autodetecting encoding if it is valid UTF-8 but it is not clear yet if this is the right path.

@ambraspace
Copy link
Author

@come-nc I encountered this in real life use case. Letters šđčćž are quite common in Slavic languages (Serbian, Croatian, Slovenian, Slovak, Czech and such). Probably more then 90% names in Serbian language end with "ić".

I thought everything is UTF-8 by default these days...

@come-nc
Copy link
Contributor

come-nc commented Mar 31, 2022

@come-nc I encountered this in real life use case. Letters šđčćž are quite common in Slavic languages (Serbian, Croatian, Slovenian, Slovak, Czech and such). Probably more then 90% names in Serbian language end with "ić".

I thought everything is UTF-8 by default these days...

Then can you state so and give real life examples in php/php-src#8279 ?
Maybe in sabre-io/http#181 also.

@CapSel

This comment was marked as off-topic.

@come-nc

This comment was marked as resolved.

@alerque
Copy link

alerque commented May 25, 2022

This issue has practically borked my entire NC installation: it affects Turkish undotted-i (ı) as well. I initially reported it as a client issue here but it turns out just the server by itself is also problematic. The UI sometimes shows these directories as scrambled, sometimes crashes on trying to open them (click on folder, UI jumps back to default login page), etc.

@alerque
Copy link

alerque commented May 25, 2022

Also note this isn't NC24, NC23 is broken too, the key is PHP 8.1. I just fired up a dedicated PHP 7.4 FPM socket for Nextcloud 23 and got my instance unborked with it.

@PaulosV
Copy link

PaulosV commented Jun 3, 2022

Confirming that setting PHP back to 8.0 (and running occ files:scan --all) is a good workaround. Ubuntu 22.04.

@greksak
Copy link

greksak commented Jun 6, 2022

I also can confirm, that PHP 8.1 not works for my installation. I am on Arch linux, NC24, names of files are mainly in Slovak. I did downgrade PHP to 7.4 scan the files like PaulosV and I am OK.

@roberthr74
Copy link

Confirming on Ubuntu 22.04 with default install of PHP 8.1. After installing PHP 8.0 everything seems to be ok.

@nursoda
Copy link

nursoda commented Jun 7, 2022

The mail app bug Add PHP 8.1 support also addresses this issue.

The proposal in PHP bug #32481 is:

If you really, really just want to know "which is the first text encoding in this list which is valid for this string", it would be better to call mb_check_encoding in a loop.

Note that as of today, the PHP manual page on mb-detect-encoding does not honor the behavior change from PHP 8.0 to 8.1 yet.

@come-nc
Copy link
Contributor

come-nc commented Jun 9, 2022

@nursoda Yes we are going to switch to mb_check_encoding but that needs to be done in sabredav and not Nextcloud.
See #31758 and sabre-io/http#182

@gkuzyaka
Copy link

Animation

This is very annoying problem...

Operating System
Linux 5.15.0-41-generic x86_64

PHP
Version: 8.1.2
Extensions: Core, date, libxml, openssl, pcre, zlib, filter, hash, json, Reflection, SPL, session, standard, sodium, cgi-fcgi, PDO, xml, bcmath, bz2, calendar, ctype, curl, dom, mbstring, FFI, fileinfo, ftp, gd, gettext, gmp, iconv, igbinary, imap, intl, ldap, exif, pdo_pgsql, pgsql, Phar, posix, readline, redis, shmop, SimpleXML, smbclient, soap, sockets, sysvmsg, sysvsem, sysvshm, tokenizer, xmlreader, xmlwriter, xsl, zip, libsmbclient, Zend OPcache

@roberthr74
Copy link

Newest NC 24.0.3 includes sabre/dav 4.4.0. Can someone confirm if it's working properly now?

@mculibrk
Copy link

I just updated nextCloud to 24.0.3.2 and can confirm the issue still persists

It seems the "problematic" character is lowercase "š" (for Slavic languages)

All other combinations seems to work ok (ŠĐČĆŽ đčćž)

Immediately after renaming a file/folder so that it contains "š" the whole file (if containing any other ŠĐČĆ... characters) gets screwed

Folder before adding "š":
image

Adding "š" at the end of the foldername:
image

but... at least, the folder/files are still accessible (even if awfully named)
image

I have an issue with MIGRATED content, which is correctly encoded in utf8 and in console and also in the "view" of files app all seems OK, but when you try to access the folder/file it gets errors like "file not found" and the users get "redirected" to "home" folder

image

@Japaanc
Copy link

Japaanc commented Jul 23, 2022

Same problem with baltic characters.
Āā,Čč,Ēē,Ģģ,Īī,Ķķ,Ļļ,Ņņ,Šš,Ūū,Žž

migrated to nc 24.0.3 and PHP 8.1.2 from nc 22.2.0, PHP 7.4.3

@PVince81
Copy link
Member

as far as I can see the fix is only on master / NC 25.

the backport for NC 24 with the library update is still open: nextcloud/3rdparty#1109

@Japaanc
Copy link

Japaanc commented Aug 12, 2022

It seems that the problem is fixed in 24.0.4!
Thanks!

@PVince81
Copy link
Member

glad to hear!

please upgrade to 24.0.4 as it contains the required library update

@alerque
Copy link

alerque commented Aug 12, 2022

@Japaanc As in PHP 8.1 support now? Nothing about that in the changelog, seems like something that should be news if so ;-)

@Japaanc
Copy link

Japaanc commented Aug 12, 2022

@alerque I,m at this moment using "PHP 8.1.2 (cli) (built: Jul 21 2022 12:10:37) (NTS)". no problems at this moment.
just tested in real life, do not know what's in changelog.

@PVince81
Copy link
Member

PHP 8.1 support was already in place for NC 24 and this ticket was the last issue that popped up, so in the changelog it might appear as bumping the sabre lib (the changelogs of patch versions are automated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. to develop Accepted and waiting to be taken care of bug
Projects
None yet
Development

No branches or pull requests