Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added util::fix_broken_serialization() which fixes common serialization ... #48

Merged
merged 2 commits into from
Jan 23, 2015

Conversation

hopeseekr
Copy link
Contributor

...problems.

There are four primary reasons PHP serialize() will break:
0. A developer edits a serialized string and forgets to change (or incorrectly changes) the string length value.

  1. CharacterSet mismatches (e.g., PHP running in ISO-8859-1 [default pre-5.4] and
    the database running in UTF-8, and higher-than-ASCII characters being used.
  2. Improper escaping by userland code or the database.
  3. Accidentally truncated data (think: DB column data type is too small for the data).

This function fixes all of those, by introspecting each string in the serialized data,
including key names, class property names, and values, and recalculating their now-current
sizes.

I have found this method essential in running foreign language sites, when faced with many
misconfigured browsers forcing wrong charactersets on the website.

…on problems.

There are three primary reasons PHP serialize() will break:
1. CharacterSet mismatches (e.g., PHP running in ISO-8859-1 [default pre-5.4] and
   the database running in UTF-8, and higher-than-ASCII characters being used.
2. Improper escaping by userland code or the database.
3. Accidentally truncated data (think: DB column data type is too small for the data).

This function fixes all of those, by introspecting each string in the serialized data,
including key names, class property names, and values, and recalculating their now-current
sizes.

I have found this method essential in running foreign language sites, when faced with many
misconfigured browsers forcing wrong charactersets on the website.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.17%) to 68.15% when pulling 1e648e5 on hopeseekr:master into 95ffee3 on brandonwamboldt:master.

@hopeseekr
Copy link
Contributor Author

Hmm... This test does fail on HHVM, because HHVM's error message does not contain enough data for the specific-failure test to actually work:

[errstr] => Unable to unserialize: [a:2:{i:0;s:6:"Normal";i:1;s:23:"High-value Char: ▒a-va?";}]. Expected '"' but got 'a'.

I also cannot even tell if HHVM even handled the higher-than-ASCII character at ALL! Going to run the fixed serialized string in HHVM and will report back...

@coveralls
Copy link

Coverage Status

Coverage increased (+0.17%) to 68.15% when pulling 06a5a8b on hopeseekr:master into 95ffee3 on brandonwamboldt:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.17%) to 68.15% when pulling c262c52 on hopeseekr:master into 95ffee3 on brandonwamboldt:master.

@hopeseekr
Copy link
Contributor Author

It's now official! This unit test uncovered a potentially serious HHVM [un]serialize() bug with high-value non-UTF-8 characters! Hurray!

HHVM serialize() unceremoniously drops high-ASCII characters from strings without warning. facebook/hhvm#4700

@brandonwamboldt
Copy link
Owner

Nice work!

Does this fix a situation where somebody manually changes a serialized value without updating the string length?

@hopeseekr
Copy link
Contributor Author

Yes it would also fix that.

brandonwamboldt added a commit that referenced this pull request Jan 23, 2015
Added util::fix_broken_serialization() which fixes common serialization ...
@brandonwamboldt brandonwamboldt merged commit d5401aa into brandonwamboldt:master Jan 23, 2015
@brandonwamboldt
Copy link
Owner

Thanks for this, it looks super useful :)

@hopeseekr
Copy link
Contributor Author

Woohoo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants