-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance for String serialization #2645
Comments
This suggestion sounds reasonable, but is this rather a theoretical issue, or something you identified as performance issue when using Gson? I assume this change would mainly help when writing large strings at the beginning of the JSON data. In the other cases it would not change much:
But I am not completely sure. It seems the main advantages with your proposed changes would be:
Would probably also make sense to check not only for |
Sorry for the delayed response, but this is not a theoretical issue, but a case we hit in an actual production app using Gson. I worked around the issue by supplying an explicit StringWriter that was sized appropriately.
That sounds about right, if I recall correctly we were serializing lots of top-level Strings.
That sounds like a good idea. |
Thanks for the additional information! To me it seems such a change for Gson would only help for this specific use case1, and for other use cases it would either have no effect, or might even negatively impact performance due to more resizing being necessary? Maybe for these use cases it would be better if the user (in this case you) pre-sizes the What do you think? Footnotes
|
Actually, this issue isn't confined to top-level String values. It can be reproduced with the following input as well: gson.toJson(Map.of(Strings.repeat("A", 33), "value"); It just seems like a problem that can be triggered by any large String being written near the start of the serialized output.
How would it negatively impact performance? The |
I think if we were going to do anything here it would be something a bit more general. When we are about to write a string of length However I feel that giant strings are probably something of a niche use case, and I'm somewhat unhappy with the idea of not only reaching into the What you could do instead would be to subclass |
Sure, I probably shouldn't have included that example workaround, as I don't care much how it's solved. I think it's worthwhile to have the discussion though about the default behavior here. A lot of this just boils down to issues with the aging
Yep, and that is what I ended up doing in the application where this was an issue; but it still means the more frequently used |
We could imagine using an alternative to One thing we could very easily do is provide a non-default initial size for the |
For example, with the current resizing it might be Also, I just noticed in your originally proposed diff above you wrote |
Gson version
Gson 2.10.1
Java / Android version
Happening on JDK21, but presumably is an issue for any JDK, as the relevant code hasn't changed in 10+ years.
Description
Serializing a String to JSON using
toJson
can result in excessive copying of the internal StringBuffer array for certain inputs, in particular for Strings with a length >= 33. This all boils down to how the internal StringBuffer used by StringWriter is initialized, and its resizing behavior: it starts with a capacity of 16, then tries to use double the previous capacity + 2 (or the size of the String being written if it is larger than that). If the capacity is increased only to accommodate the size of the String being written, then the final call to write the closing double quote will trigger another resize.Expected behavior
Avoid excessive allocations / copying as much as possible. JsonWriter's string method could check if the Writer is a StringWriter, and call ensureCapacity on its StringBuffer, to avoid the excessive resizing.
Obviously the above really only works for Strings that don't require escaping, but that may even be sufficient for most scenarios.
Actual behavior
Reproduction steps
It's easier to look at a trivial case where the input being serialized does not require any escaping:
The above resulted in three resizes of the internal array, the last of which being the most excessive (35 -> 75 for a single character).
The text was updated successfully, but these errors were encountered: