-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally intern strings during unmarshaling #191
Comments
@CAFxX I'm willing to pick this up. Let me know if I can proceed. |
Yup, I'm not working on this. |
Hello, @darkdefender27 any news on this? |
I actually went ahead and did this. |
I think as most of unmarshalled data is short living it would be better to add "noclone" attribute to make all the strings refer to original buffer. |
There will be no case of 2nd memory allocation in case of interning the strings, only at the first occurrence, and that will be long lived in memory. I think the "noclone" optimisation should be done as a separate flag, using the already builtin unsafeString from the lexer. But I see a different case than the "string interning". One is for low cardinality fields and the other for short-lived long bytes/strings. |
To reduce allocations when unmarshalling it would be useful to allow users to specify that when decoding certain string fields the field values are likely to be repeated and therefore should be interned to avoid having duplicate copies of the same string in memory.
A string/[]byte interning package is available at https://github.com/josharian/intern.
This mechanism should be optional (not all fields contain duplicated values); ideally via a field tag such as
intern
, example:The example above would instruct the unmarshaling code to perform interning on the value of the
State
field when that field is decoded.See golang/go#5160 (comment) for an experience report about a json-unmarshaling-heavy application where interning certain values significantly reduced memory usage.
The text was updated successfully, but these errors were encountered: