-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added util::fix_broken_serialization() which fixes common serialization ... #48
Conversation
…on problems. There are three primary reasons PHP serialize() will break: 1. CharacterSet mismatches (e.g., PHP running in ISO-8859-1 [default pre-5.4] and the database running in UTF-8, and higher-than-ASCII characters being used. 2. Improper escaping by userland code or the database. 3. Accidentally truncated data (think: DB column data type is too small for the data). This function fixes all of those, by introspecting each string in the serialized data, including key names, class property names, and values, and recalculating their now-current sizes. I have found this method essential in running foreign language sites, when faced with many misconfigured browsers forcing wrong charactersets on the website.
Hmm... This test does fail on HHVM, because HHVM's error message does not contain enough data for the specific-failure test to actually work: [errstr] => Unable to unserialize: [a:2:{i:0;s:6:"Normal";i:1;s:23:"High-value Char: ▒a-va?";}]. Expected '"' but got 'a'. I also cannot even tell if HHVM even handled the higher-than-ASCII character at ALL! Going to run the fixed serialized string in HHVM and will report back... |
It's now official! This unit test uncovered a potentially serious HHVM [un]serialize() bug with high-value non-UTF-8 characters! Hurray! HHVM serialize() unceremoniously drops high-ASCII characters from strings without warning. facebook/hhvm#4700 |
Nice work! Does this fix a situation where somebody manually changes a serialized value without updating the string length? |
Yes it would also fix that. |
Added util::fix_broken_serialization() which fixes common serialization ...
Thanks for this, it looks super useful :) |
Woohoo! |
...problems.
There are four primary reasons PHP serialize() will break:
0. A developer edits a serialized string and forgets to change (or incorrectly changes) the string length value.
the database running in UTF-8, and higher-than-ASCII characters being used.
This function fixes all of those, by introspecting each string in the serialized data,
including key names, class property names, and values, and recalculating their now-current
sizes.
I have found this method essential in running foreign language sites, when faced with many
misconfigured browsers forcing wrong charactersets on the website.