-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement optional string interning #279
Conversation
Hello, thanks for PR. But why this needed? I understand that we can save 1 alloc per string but we need more time for unmarshalling. |
@GoWebProd an experience report is included in the linked issue #191:
(furthemore, see also the ongoing discussion in golang/go#32779 about adding interning to Basically it's the same reason why sometimes you want to use Keep in mind that allocating memory and garbage collecting it also consume CPU. So while it's true you spend some more CPU time to do string interning, it's also true you save CPU time by not allocating and by being able to run GC less often, and on a smaller heap. Keep in mind that this feature is opt-in on a field-by-field basis; so for easyjson users that don't opt-in nothing changes, and they get the usual behavior with no overhead. This is for the same reason why it makes no sense to provide a benchmark showing the CPU difference: such a benchmark would be useless as it completely depends on how frequent the values are:
As such, the only reasonable option is to let the user enable it, if needed - ideally doing their own benchmarks. |
Ok, looks needed. But I do not like the current implementation of interning, I will consider alternatives and come back. |
@CAFxX please see my variant of interning |
@GoWebProd what is it specifically you don't like about the old version? The old version is significantly simpler in josharian/intern, and it has already seen significant production usage. Also AFAICT your version lacks any synchronization, so if two goroutines perform unmarshaling of an object of the same type at the same time, you will get a data race. Unless I'm mistaken, the current code should probably not be merged. Furthermore, you have hardcoded the cache size, and there's no way for the user to change it. Note that my version did not need it thanks to the use of sync.Pool by josharian/intern. I honestly think the previous version was simpler (both regarding the code in the easyjson repo, as well as including the dependency), more correct and - if the data races in the new version were fixed - probably even faster. |
Yes, you need use mutex if want to use in many goroutines. But in josharian/intern you have N maps (if you have N concurrent access) and releasing at any time by GC. |
This would be a breaking change no? Current users don't need to do this. Also adding a mutex would significantly hinder scalability...
Correct. But I fail to see how that's a problem. In fact, note that the same applies to your version: every type has a different interning table (so strings from different objects are not deduplicated), and strings can be dropped of the table by hitting the cache size limit (and consider what happens if a json object contains more than 128 different strings!). |
I’ll think about a concurrent list, if not solution, then we’ll take the josharian |
Uses josharian/intern to optionally perform string interning during unmarshaling.
Saves one allocation per string field.
Fixes #191