-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The current Jackson configuration allows serializing cluster states that cannot be read back #96976
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
I'm asking @elastic/es-core-infra for help here because this seems like a fundamental problem in the |
I agree this is a problem with xcontent. However, I do not think we should raise this limit to max int. That would be a 2GB string. Why did we have a 5MB string in the first place? I'm more inclined to block writing the xcontent with a string length over the limit in the first place. |
I agree, the problem was with the write side of things here initially and we should definitely not allow a string that long. I wonder if maybe we should have the ability to customize an individual XContentParser instance better here? I can see how you'd want that kind of limit on a string when ingesting documents or deserializing REST requests in general, it's less clear to me that you want it when working with cluster state or other internal on-disk metadata where any failure to read can mean big trouble. |
At least if we do this then the state on disk does not become unreadable, which would be much better than how things are today. But just to be clear, if we fail at write time then the master will not be able to commit the state update, which causes it to stand down and start a new election cycle. There's a good chance that the troublesome request came from a |
I expect we can find a way to make this situation un-retryable, once we put in place some mechanism to make the whole write fail. |
The current Jackson configuration uses a default maximum string length of ~5M characters. Any string longer than that, will throw an exception when deserializing.
This was uncovered in #96918 where a 5M+ string resulted in a completed broken cluster after restart. Since such a string cannot be deserialized by the current code, we can neither read the state from disk on node restart, nor can we restore a snapshot containing this string.
It seems the easiest fix here would be to increase the limit for the smile factory that we use to read the cluster state to Integer.MAX_VALUE I think.
The text was updated successfully, but these errors were encountered: