Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrectly packed strings #2

Open
mindplay-dk opened this issue Jun 30, 2017 · 2 comments
Open

Incorrectly packed strings #2

mindplay-dk opened this issue Jun 30, 2017 · 2 comments

Comments

@mindplay-dk
Copy link

I recently tried to pack a binary-form UUID with this library, and for certain specific UUIDs, it would fail.

I just realized why: you're packing PHP strings as msgpack strings - but msgpack strings are UTF-8 Unicode strings, and the PHP string-type is just binary data.

Some binary sequences will be invalid code-points, so packing/unpacking will fail.

I believe PHP strings should be packed as the binary type in msgpack.

At least that's what the PECL extension does.

The problem of course is if you know you have unicode strings, and if the msgpack recipient on the other end is not a PHP script, and expects strings to be encoded as strings.

Since there's no unicode string type in PHP, the rybakit/msgpack package actually goes so far as to use a UTF-8 detection/validation pattern, which must have detremental performance implications.

I guess there's no way to do this "right" in PHP, but if you're going to encode PHP strings as strings, at least the readme should probably note that binary strings aren't supported?

Alternatively, you could try the slightly faster UTF-8 string detection "hack" I used here.

@rybakit
Copy link

rybakit commented Sep 4, 2017

Hey @mindplay-dk, author of rybakit/msgpack.php here.

Alternatively, you could try the slightly faster UTF-8 string detection "hack" I used here.

Just for the record, last time I benchmarked the UTF-8 string detection, the regex used in my library was faster than //u. Please see this commit and a discussion on Reddit: https://www.reddit.com/r/PHP/comments/54h70q/new_msgpackphp_v020_release_with_40_performance/d81u1i6/.

I believe PHP strings should be packed as the binary type in msgpack.

Also, it is worthwhile to mention that you can pack UUIDs by calling $packer->packBin() or set the string detection mode to Packer::FORCE_BIN. In both cases, the packer will skip detecting the string type.

@rybakit
Copy link

rybakit commented Sep 4, 2017

At least that's what the PECL extension does.

msgpack/msgpack-php#72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants