-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Undocumented mb_substr behaviour in PHP 8.3? #14703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, many behavior changes in @alexdowad Could you please confirm? |
Well, that code is clearly fundamentally incorrect. It tries to
Look at the intermediate steps: https://3v4l.org/dRaZH I don't see anything in the changelog for 8.3.2 so I suspect there was some refactoring done in mbstring that unintentionally fixed this - perhaps the earlier implementation didn't do any processing in the case of a substr over [0, null) but it does now. Current behavior is correct, though: fix the mb_substr step to use the right encoding and the output is as expected. |
Yes, the code is intentionally incorrect for this example, hence was asking about documentation, was not arguing against the change, which makes sense :) From our codebase found a function which was accidentally working from PHP 5.3 to 8.2, because it was relaying on this old behaviour while manipulating legacy encodings. |
|
@youkidearitai is right; the updated @jahyvari Thanks for letting us know about your experience. Maintaining mbstring has given me a deep appreciation for Hyrum's Law. It is incredibly difficult to tell which minor bug fixes may accidentally break someone's code, which relied on the buggy behavior. In this case, I probably should have known that the bug fix was big enough that it should have been called out in the release notes. I am sorry about that. |
Hi team, Up on this topic,
Output were different, Output for 8.3.2 - 8.3.11 It's really an issue for us and make us in trouble with mb_substr usages... Can you inestigate on your side? Is it related to this changelog 8.3.2...?
What does it mean ? Best We use PHP 8.3.10 for now For more example
Output for 8.3.2 - 8.3.11 On the other way
Output for 8.1.0 - 8.1.29, 8.2.0 - 8.2.23, 8.3.0 - 8.3.11 Without string conversion
Output for 8.1.0 - 8.1.29, 8.2.0 - 8.2.23, 8.3.0 - 8.3.11 It's really not consistent when the string is manipulated before, even if it did not make the job that i want here, the mb_substr behaviour did not make his job ... |
Hi @youkidearitai This change is really problematic https://3v4l.org/ZOvX2 between 8.3.1 and next because our unittests failed and we are not confident on mb_functions feedback Sorry if i did not understand perfectly what you mean... Best |
@DAdq26 Thanks for letting the PHP team know about the problem you are having. It is also good that you shared the PHP code which is not behaving as you expected. The code you shared is incorrect, and if you fix it, you will likely find that it works better. If not, always feel free to discuss further. The problem with your code is that you are taking a UTF-8 string, converting it to ISO-8859-15, and then telling
|
Thanks @alexdowad right, you are on the right way but really, the difference on the final output is not stable for us. You said mb_substr does not handle wrong encoding binary correctly. as developer we can not say that to our customer... Best |
I'm sorry that my answer has led to a misunderstanding. It is not true that " Over the last couple of years, However, it occasionally happens that a user prefers the old behavior. Often this is a sign that their code may have bugs, and when the bugs are fixed, they will not have any problem. However, in extremely unusual cases, there might actually be good reasons to prefer the old behavior. The maintainers are very willing to look at such rare cases and advise what the best workaround is; but the above code sample is not such a case. The code sample that you shared is simply buggy. In conclusion:
|
Many thanks for your time @alexdowad Thanks for your job! best |
Description
The following code:
Resulted this output in PHP 8.2:
But resulted this output in PHP 8.3:
Based on this, did the pre PHP 8.3
mb_substr
return the string always in the "raw" binary form, but in PHP 8.3 it will return the string using the passed encoding, and therefore could lead to an encoding conversion?PHP Version
PHP 8.3.8
Operating System
No response
The text was updated successfully, but these errors were encountered: