-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High object churn in DB Reader #13
Comments
Lookups create a significant number of string when the read the structure out of the database. Potentially an object pool could be created in the Decoder and used to cache objects rather than reading them out of the database on each lookup. This would reduce churn, although it might not actually speed up the lookups significantly unless you allowed it to use a very large amount of memory. We currently do not have plans to change this behavior. |
Probably the easiest way to reduce churn is to cache key names, although we would want to limit the cache size as there is no limit on number of keys in an arbitrary database. |
Thx for the input ... we haven't dealt with that issue yet. It's still in our backlog and gets occasionally remarked on by our monitoring guys but so far it is not enough pain to get it bumped to the top. |
+1 for this. The design of this API is not good enough for high-performance scenarios. It's very wasteful that it dynamically creates an object tree of everything for a node (which then in the higher level API is mapped to actual objects), only to get the values for the few keys you really needed. |
Caching canonical string instances for map keys would indeed be a useful optimization which would not require a major redesign, so worth implementing. However, an API redesign is definitely needed to improve things significantly. Two major gripes:
And another implementation detail that should be changed in the internal design: every decode operation allocates a new Decoder.Result instance (instead the offset could be advanced during decoding). This is the second largest source of garbage creation after String decoding. |
Thanks for your feedback. I largely agree with your two gripes for very high performance scenarios. Jackson was used for ease of deserializing the data to our GeoIP2 model classes, which are shared with the web service and already used Jackson. Although this design has the problems outlined here in terms of object churn, it does seem to meet the requirements of most of our customers. |
I would urge you to consider a redesign of this API. Yes, most people in the "enterprise" java space have such bloated libraries and code that this library is not a bottleneck. That said, it could be an order of magnitude faster with proper design. In our case, the one geoip lookup per request is about a third of all garbage created (and we do have a service solving a rather more complex problem :) |
For starters, would returning plain Java objects be a good direction? This eliminates the need for a dependency, but Jackson should still be able to bind to objects just as easily. But this would also reduce the need for the wrapper objects (i.e. a TextNode around every String, an ObjectNode around each Map, etc). (This is of course a breaking change for API clients, so this is more about discussing longer-term design directions) |
Could you expand on what you mean by plain Java objects here? The structure of the data record in the database isn't known ahead of time in the general case. Do you just mean a I suspect for your particular case, based on you mentioning needing a "few keys", some sort of lookup-path method would be the best solution similar to the data lookup functions in the C API. The Go API takes a different approach and only decodes what is necessary to fill out the struct passed to it. I suppose this could also be done with the Java API, but it would likely require a reasonable amount of work. |
Yes, I meant Map<String, Object> vs ObjectNode, String vs TextNode, etc. But indeed, this in itself is not going to yield great perf gains, so probably not worth bothering (just for the sake of a cleaner API :) |
Based on a quick experiment, it seems that introducing a reasonably sized shared cache for pointer objects would yield "good enough" results. With a few megabytes, you can get a 5x performance increase (due to the reduced memory churn and less pointer-chasing). You can take a look here and run the benchmark: https://github.com/phraktle/MaxMind-DB-Reader-java/tree/caching For a production quality implementation, one would need a bounded cache of course. After a million random lookups, the cache has around 32k objects. |
Update: added some abstractions for pluggable caching and a very basic bounded cache (no eviction, just holds the first N items – this works perfectly well here, since the first few thousand items have all the hot items anyway). Let me know your thoughts on this, if it seems like a reasonable approach to you, I can pretty it up and submit a pull request. (Bottom line: ~5x perf improvement with ~2Mb cache footprint) |
Overall, this seems good. I am somewhat uncomfortable caching mutable objects, and I am not sure about lack of eviction policy. Have you tested this in comparison to just caching the keys? I assume that in addition to the keys, some hot countries are being cached, but so are many things that might be very uncommon. |
Caching just the String map keys didn't yield a very significant speedup (a marginal 10-20%, compared to the ~5x). Caching the results of pointer resolutions works out much better, since it keeps some higher level structures together without further pointer-chasing. Or maybe I was doing it wrong ;) On lack of eviction: doing random IP lookups seems to be a pessimistic simulated scenario (in practice you would have better clustering of related areas). Even with this distribution, keeping the first 4k items encountered works out to a nice hit-ratio. One could provide a more sophisticated cache to get better hit-ratios, but it would add a lot of complexity with diminishing returns (certainly not worth bringing in a proper cache library as a dependency). On mutability: ValueNode subclasses (e.g. TextNode) are immutable. ObjectNodes are mutable by default, but we could pass in an immutable Map in its constructor (or perhaps subclass and override the set method). |
I think if we can reasonably solve the mutability issue and expose a way to disable the cache in the Reader class, this could be merged. The speed-up is indeed quite significant, and you are likely right that much of it comes from caching some of the larger structures. |
I've exposed the Cache as a parameter on overloaded Reader constructors, defaulting to no caching. The immutability aspect is a bit problematic, due to flaws in the Jackson API design. I have opened a ticket with them FasterXML/jackson-databind#1048, but we'll obviously need a workaround. For ObjectNodes, we're okay (pass in immutable Map in constructor). The problem is with ArrayNodes that have no analog. (Of course the sane approach would be not to build on a JSON API, but use plain structures as suggested above, which would pose no such problems). |
Let's see if they respond in a timely fashion. In regards to not using Jackson in this API, we might entertain the idea assuming it doesn't cause any issues for the higher level GeoIP2 API. Using |
Looks like a small change will be released in the upcoming Jackson 2.7 to support immutability. I have also asked for a backport to 2.6.x, since 2.7 is Java7 only and it seems the current GeoIP library target is Java6. Given that JDK6 has been end-of-lifed 3 years ago, I think it's reasonable to raise the GeoIP target to Java7 as well (which could also improve the codebase, e.g. type inference). |
While we work out the issues related to caching, could you provide a release of the geoip APIs with the small tweaks I have submitted? These yield a solid ~20% overall performance improvement and a significant reduction in GC, so it should be useful for all users of the library. |
Do you need a release of the GeoIP2 API as well this API? We are in the midst of several changes there and are not quite ready to do the next release. We could, however, do snapshot release of both if that is sufficient for your needs. (Or you could just force the version of this API to use it with the current GeoIP2 API.) |
The last option (forcing a higher revision of this dependency) sounds fine for now, thanks! This way you can just release a new db reader artifact and update geoip2 at your desired pace. |
@phraktle, I did a release of |
Thanks, I can confirm it works. Mean latency for a geoip lookup in production went from 50μs down to 40μs. |
Guava has a caching system that handles eviction and expirations. Perhaps it's worth using that? https://github.com/google/guava/wiki/CachesExplained |
Hi @andrewachen, I've designed with a pluggable caching interface, so one could plug in other types of caches. But I don't think the core library should have large external dependencies like this (even though I'm a happy user of Guava in general). Also, in practice a small and simple cache with no eviction performs great: just caching the first thousand items achieves a 95% hit rate in random geoip lookups. So there's no real benefit in a more complicated cache. |
Hi @oschwald, Looks like making the JSON nodes immutable would break the higher level maxmind GeoIP API, since it's actually trying to mutate the node, adding traits and ip address. Any chance you could revise that, so it does not expect mutable nodes? (this will block releasing the lower level API too). Also, the whole JSON serialization is going to be a bottleneck. It already is: the lower level API can do 78k qps while the higher level API ~46k qps. With the caching in place, lower level is ~450k qps, while the higher level less than 100k. Thanks, |
@phraktle, I briefly looked into this, but it turns out to be a bit complicated. The obvious way to do it would be to use Jackson's value injection. This works fine for the database reader, but it does not work for the web service, where the IP address is returned from the web service. Unfortunately, I don't see an obvious way for more normal deserialization and injection to co-existing for the same parameter. There are possible workarounds, but none of them seem particularly pretty. For the .NET reader, I am toying with having the |
Submitted a PR for caching that strikes a middle ground by cloning mutable container nodes. This is not ideal, but still a major improvement over no caching (~250k qps vs ~80k qps). |
@oschwald, yes, binding directly to model objects would make sense. The tricky part is that the database seems completely generic and has a deep model (i.e. nested objects), so the binding would have to involve reflection / dynamic class instantiation with awareness of collection types (which would need annotations, given the type erasure in Java). I suppose it's manageable, but it sounds a bit elaborate for my taste. Perhaps a simpler direction would be to have a well-defined schema in the database and then it's easy to have the corresponding model objects (i.e. the database structure specifies the object type explicitly, instead of everything being a generic Map type). But this is clearly needs a format change (though maybe there's a way to add in the type information in a backwards compatible way). I guess in the meantime, it would very helpful if you could ensure the higher-level geoip2 API does not perform mutation on the nodes returned from the db API (though it could still mutate the model objects after mapping). This way the DB API could avoid deep copies in caching. |
Yes, type erasure would make this somewhat more complicated than with C# or Go, but it might be a worthwhile direction longer term. I think we could support a limited number of collection types without too much issue. I generally agree that including a schema in the database metadata would be a big improvement. Although I think it could be done without breaking compatibility, it is not likely to happen in the near future due to the number of changes it would require in the generation and writing code. Cloning the nodes seems like a good compromise for now. I'll take a look at that PR. |
Thanks for merging the caching change. If you cut a release of the db API, I can also go ahead and update the GeoIP API to allow caching. (Do you think it should expose passing in an arbitrary cache implementation, or just make caching a boolean option?) |
Btw, another potential optimization opportunity in the decoder is that short strings are stored in-line in the database (rather than as a pointer – the cache took care of those). These could be cached in a prefix tree for lookups with zero allocation, rather than doing a String decode. |
I'll try to get a release out on Monday. In regards to the caching, I lean slightly towards allowing the passing of an arbitrary implementation. Are you seeing evidence of the inlined keys causing slowdowns in the profiler? I wonder how much we could gain by eliminating those allocations. |
I did a snapshot release for now ( |
Based on a simple mock implementation, doing lookups for short (2 character) inline strings would reduce allocations by ~10% (that's roughly half of all String allocations), and would improve runtime by around 5%. In any case, the current bottleneck is the excessive object cloning in the node cache – eliminating that would almost double the throughput. After that, eliminating the Result return values would be next on the bottleneck list. (But even the current cloning cache approach is a huge reduction in garbage compared to no caching, since the ValueNodes and Strings don't get copied). |
Submitted maxmind/GeoIP2-java#55 |
Thanks! I release a new version of |
Thanks! With the new release (and caching enabled), mean latency for a lookup via the GeoIP2 API in production went down from 40μs to 30μs. |
Hi @oschwald, What do you think about making the ipAddress fields in the GeoIP2 API mutable, so it can be set after the JSON transformation instead of mutating the JSON tree returned from the DB API? Thanks, |
Although I don't love the idea and would prefer to deserialize to the objects directly, I think this could be ok as long as the setters were package-private. I don't think setters should be added to the public API. |
Ok. Another approach might be to customize the Jackson deserialization process, so that it injects this property without having to change the tree model. I have not looked into this yet, I just have a hunch that Jackson might be flexible enough for such things. |
Yeah, customizing the Jackson deserialization process would be ideal. I too suspect that it is doable with Jackson. |
With these changes, object churn is much improved. Further optimizations of the DB API provide diminishing returns, since the bottleneck is now the JSON tree to Object deserialization (i.e. the higher-level GeoIP2 API). Based on profiling, it's mostly due to having to construct HashMaps for the localized strings in the object model. In any case, my take is that for meaningful further improvements, a larger redesign would need to happen (i.e. an improved object model, direct deserialization instead of interim JSON tree, etc). |
Thanks for all your work on this! I agree with your conclusions. I am going to close this particular issue for those reasons. New, specific issues can be opened as appropriate. |
We are using the GeoIP2-java library to read from the GeoIP2 database file. Our throughput is rather high with thousands of IPs resolved per second and we noticed that there is a high object churn in the Maxmind APIs with huge numbers of strings created only to be thrown away soon after again.
Are you aware of this issue? Any plans to improve the API in this aspect or any idea how we could improve this behaviour ourselves?
The text was updated successfully, but these errors were encountered: