Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read methods should not mutate direct buffer #259

Closed
brianfromoregon opened this issue Oct 30, 2014 · 9 comments
Closed

Read methods should not mutate direct buffer #259

brianfromoregon opened this issue Oct 30, 2014 · 9 comments

Comments

@brianfromoregon
Copy link

Somewhat related to #128. UnsafeMemoryInput's readAscii is calling nioBuffer.put to temporarily mutate the data. This is no good, my buffer is a region of mmap file and a process bounce inside readAscii could corrupt the data. Read methods cannot write to the buffer.

@romix
Copy link
Collaborator

romix commented Nov 26, 2014

@brianfromoregon Yes, this issue is mentioned on the front-page. It is a known fact that our default String reader mutates the buffer. It is also pretty easy to write your own String serializer which would avoid doing so at the expense of being a bit slower.

That being said, we would accept patches that try to fix problems in a reasonable way.

@NathanSweet @magro
May be we should actually include a "safe" String serializer into a standard Kryo distribution, so that people who are hit by this buffer mutation problem may easily switch to it from our standard String serializer? One problem that I see here is that it would probably use a different format (e.g. length + payload)....

@romix
Copy link
Collaborator

romix commented Nov 30, 2014

@NathanSweet @magro Nate, it seems that a half or at least one third of all bug reports that we receive are related to problems with deserialization in multi-threaded environments, because we mutate the input buffer.

I'm slowly getting more and more convinced that we should better be safe by default, rather than fast by default. I think even in a safe mode, Kryo will still be very fast. In the worst case, only String serialization/deseralization is affected. Yes, it is an important use-case. But at the same time, virtually all other serialization libs do not have this problem related to buffer mutation, because they use a "slow-path" and let JDK create Strings by (eventually) creating a copy. And yet, those libs are not that slow and are widely used.

What do you think? Should we try to make String deserialization safe by default?

@magro
Copy link
Collaborator

magro commented Dec 2, 2014

Being safer/more robust sounds good to me. AFAICS it should be possible
that one can still choose the fast path so that we won't really use
anything.

Cheers,
Martin
Am 30.11.2014 21:45 schrieb "romix" notifications@github.com:

@NathanSweet https://github.com/NathanSweet @magro
https://github.com/magro Nate, it seems that a half or at least one
third of all bug reports that we receive are related to problems with
deserialization in multi-threaded environments, because we mutate the input
buffer.

I'm slowly getting more and more convinced that we should better be safe
by default, rather than fast by default. I think even in a safe mode, Kryo
will still be very fast. In the worst case, only String
serialization/deseralization is affected. Yes, it is an important use-case.
But at the same time, virtually all other serialization libs do not have
this problem related to buffer mutation, because they use a "slow-path" and
let JDK create Strings by (eventually) creating a copy. And yet, those libs
are not that slow and are widely used.

What do you think? Should we try to make String deserialization safe by
default?


Reply to this email directly or view it on GitHub
#259 (comment)
.

@NathanSweet
Copy link
Member

Yep, we should fix it. Using an additional 0x80 byte for each string would
fix the problem and still allow us write and read strings efficiently. This
means ASCII strings with length > 1 and < 64 will use an additional byte
which is unfortunately but I guess better than losing performance.

We should also look at using reflection/ReflectASM for creating Strings
without allocating, by using a private constructor when available in the
JRE.

On Sun, Nov 30, 2014 at 9:45 PM, romix notifications@github.com wrote:

@NathanSweet https://github.com/NathanSweet @magro
https://github.com/magro Nate, it seems that a half or at least one
third of all bug reports that we receive are related to problems with
deserialization in multi-threaded environments, because we mutate the input
buffer.

I'm slowly getting more and more convinced that we should better be safe
by default, rather than fast by default. I think even in a safe mode, Kryo
will still be very fast. In the worst case, only String
serialization/deseralization is affected. Yes, it is an important use-case.
But at the same time, virtually all other serialization libs do not have
this problem related to buffer mutation, because they use a "slow-path" and
let JDK create Strings by (eventually) creating a copy. And yet, those libs
are not that slow and are widely used.

What do you think? Should we try to make String deserialization safe by
default?


Reply to this email directly or view it on GitHub
#259 (comment)
.

@serverperformance
Copy link
Contributor

Fully agree. Regarding your second paragraph, if UNSAFE is available, you can also perform this way instead of using that package private constructor (which don’t exist in all implementations):

  1. Alloc by using Unsafe.allocateInstance
  2. Set its value field also by using Unsafe.
  3. IF you want to keep backwards compatibility to JDK <= 1.6 (included), you must also set the count field if it exists. I would strongly recommend it

This works like a charm (except maybe in IBM J9 and JRockit in which Unsafe.allocateInstance seems less efficient as invoking sun.reflect.ReflectionFactory.newConstructorForSerialization).

Anyway, performance of any solution decided should be carefully tested, as many times System.arraycopy has the same or better performance than these tricks (maybe some modern VMs on modern systems perform COW optimizations?), except of course for GC pressure.

Cheers!

De: Nathan Sweet [mailto:notifications@github.com]
Enviado el: martes, 02 de diciembre de 2014 15:35
Para: EsotericSoftware/kryo
Asunto: Re: [kryo] Read methods should not mutate direct buffer (#259)

Yep, we should fix it. Using an additional 0x80 byte for each string would
fix the problem and still allow us write and read strings efficiently. This
means ASCII strings with length > 1 and < 64 will use an additional byte
which is unfortunately but I guess better than losing performance.

We should also look at using reflection/ReflectASM for creating Strings
without allocating, by using a private constructor when available in the
JRE.

On Sun, Nov 30, 2014 at 9:45 PM, romix notifications@github.com wrote:

@NathanSweet https://github.com/NathanSweet @magro
https://github.com/magro Nate, it seems that a half or at least one
third of all bug reports that we receive are related to problems with
deserialization in multi-threaded environments, because we mutate the input
buffer.

I'm slowly getting more and more convinced that we should better be safe
by default, rather than fast by default. I think even in a safe mode, Kryo
will still be very fast. In the worst case, only String
serialization/deseralization is affected. Yes, it is an important use-case.
But at the same time, virtually all other serialization libs do not have
this problem related to buffer mutation, because they use a "slow-path" and
let JDK create Strings by (eventually) creating a copy. And yet, those libs
are not that slow and are widely used.

What do you think? Should we try to make String deserialization safe by
default?


Reply to this email directly or view it on GitHub
#259 (comment)
.


Reply to this email directly or view it on GitHub #259 (comment) . https://github.com/notifications/beacon/AF7EeBW3mJwNceFV3uR3bEj0vabTsK3yks5nTcWNgaJpZM4C0_9R.gif

@brianfromoregon
Copy link
Author

Just a thought, if Kryo provided an ascii-only CharSequence Serializer which avoided String entirely, I'd prefer that and avoid the String construction issue. Many LL code bases have some AsciiString equivalent class so it should be a familiar concept for a number of your users.

@romix
Copy link
Collaborator

romix commented Dec 3, 2014

Good! If we all agree that we should fix it, who is going to try and submit a PR?
I'd like to do it, but I'm very busy at work these weeks. Anyone volunteers?

@magro
Copy link
Collaborator

magro commented Jun 9, 2018

@NathanSweet I think this is fixed in 5.0, right?

@NathanSweet
Copy link
Member

Yep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants