Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High object churn in DB Reader #13

Closed
DigitalGoldfish opened this issue Dec 17, 2014 · 44 comments
Closed

High object churn in DB Reader #13

DigitalGoldfish opened this issue Dec 17, 2014 · 44 comments

Comments

@DigitalGoldfish
Copy link

We are using the GeoIP2-java library to read from the GeoIP2 database file. Our throughput is rather high with thousands of IPs resolved per second and we noticed that there is a high object churn in the Maxmind APIs with huge numbers of strings created only to be thrown away soon after again.

Are you aware of this issue? Any plans to improve the API in this aspect or any idea how we could improve this behaviour ourselves?

@oschwald
Copy link
Member

Lookups create a significant number of string when the read the structure out of the database. Potentially an object pool could be created in the Decoder and used to cache objects rather than reading them out of the database on each lookup. This would reduce churn, although it might not actually speed up the lookups significantly unless you allowed it to use a very large amount of memory.

We currently do not have plans to change this behavior.

@oschwald
Copy link
Member

Probably the easiest way to reduce churn is to cache key names, although we would want to limit the cache size as there is no limit on number of keys in an arbitrary database.

@DigitalGoldfish
Copy link
Author

Thx for the input ... we haven't dealt with that issue yet. It's still in our backlog and gets occasionally remarked on by our monitoring guys but so far it is not enough pain to get it bumped to the top.

@phraktle
Copy link
Contributor

+1 for this. The design of this API is not good enough for high-performance scenarios.

It's very wasteful that it dynamically creates an object tree of everything for a node (which then in the higher level API is mapped to actual objects), only to get the values for the few keys you really needed.

@phraktle
Copy link
Contributor

Caching canonical string instances for map keys would indeed be a useful optimization which would not require a major redesign, so worth implementing.

However, an API redesign is definitely needed to improve things significantly. Two major gripes:

  1. avoid reading all fields all the time (most requests really only need a few fields).
  2. use proper objects / primitives, instead of bloating things unnecessarily with a JSON API (this should not even be a dependency, since there's no need for JSON here).

And another implementation detail that should be changed in the internal design: every decode operation allocates a new Decoder.Result instance (instead the offset could be advanced during decoding). This is the second largest source of garbage creation after String decoding.

@oschwald
Copy link
Member

Thanks for your feedback. I largely agree with your two gripes for very high performance scenarios. Jackson was used for ease of deserializing the data to our GeoIP2 model classes, which are shared with the web service and already used Jackson. Although this design has the problems outlined here in terms of object churn, it does seem to meet the requirements of most of our customers.

@phraktle
Copy link
Contributor

I would urge you to consider a redesign of this API. Yes, most people in the "enterprise" java space have such bloated libraries and code that this library is not a bottleneck. That said, it could be an order of magnitude faster with proper design. In our case, the one geoip lookup per request is about a third of all garbage created (and we do have a service solving a rather more complex problem :)

@phraktle
Copy link
Contributor

For starters, would returning plain Java objects be a good direction? This eliminates the need for a dependency, but Jackson should still be able to bind to objects just as easily. But this would also reduce the need for the wrapper objects (i.e. a TextNode around every String, an ObjectNode around each Map, etc). (This is of course a breaking change for API clients, so this is more about discussing longer-term design directions)

@oschwald
Copy link
Member

Could you expand on what you mean by plain Java objects here? The structure of the data record in the database isn't known ahead of time in the general case. Do you just mean a Map<String, Object> or an Object[]? I am not sure I like that change enough to do a major release around it. (In fact, if I recall correctly, a very early version did just that.)

I suspect for your particular case, based on you mentioning needing a "few keys", some sort of lookup-path method would be the best solution similar to the data lookup functions in the C API.

The Go API takes a different approach and only decodes what is necessary to fill out the struct passed to it. I suppose this could also be done with the Java API, but it would likely require a reasonable amount of work.

@phraktle
Copy link
Contributor

Yes, I meant Map<String, Object> vs ObjectNode, String vs TextNode, etc. But indeed, this in itself is not going to yield great perf gains, so probably not worth bothering (just for the sake of a cleaner API :)

@phraktle
Copy link
Contributor

Based on a quick experiment, it seems that introducing a reasonably sized shared cache for pointer objects would yield "good enough" results. With a few megabytes, you can get a 5x performance increase (due to the reduced memory churn and less pointer-chasing). You can take a look here and run the benchmark: https://github.com/phraktle/MaxMind-DB-Reader-java/tree/caching

For a production quality implementation, one would need a bounded cache of course. After a million random lookups, the cache has around 32k objects.

@phraktle
Copy link
Contributor

Update: added some abstractions for pluggable caching and a very basic bounded cache (no eviction, just holds the first N items – this works perfectly well here, since the first few thousand items have all the hot items anyway). Let me know your thoughts on this, if it seems like a reasonable approach to you, I can pretty it up and submit a pull request. (Bottom line: ~5x perf improvement with ~2Mb cache footprint)

@oschwald
Copy link
Member

Overall, this seems good. I am somewhat uncomfortable caching mutable objects, and I am not sure about lack of eviction policy. Have you tested this in comparison to just caching the keys? I assume that in addition to the keys, some hot countries are being cached, but so are many things that might be very uncommon.

@phraktle
Copy link
Contributor

Caching just the String map keys didn't yield a very significant speedup (a marginal 10-20%, compared to the ~5x). Caching the results of pointer resolutions works out much better, since it keeps some higher level structures together without further pointer-chasing. Or maybe I was doing it wrong ;)

On lack of eviction: doing random IP lookups seems to be a pessimistic simulated scenario (in practice you would have better clustering of related areas). Even with this distribution, keeping the first 4k items encountered works out to a nice hit-ratio. One could provide a more sophisticated cache to get better hit-ratios, but it would add a lot of complexity with diminishing returns (certainly not worth bringing in a proper cache library as a dependency).

On mutability: ValueNode subclasses (e.g. TextNode) are immutable. ObjectNodes are mutable by default, but we could pass in an immutable Map in its constructor (or perhaps subclass and override the set method).

@oschwald
Copy link
Member

I think if we can reasonably solve the mutability issue and expose a way to disable the cache in the Reader class, this could be merged. The speed-up is indeed quite significant, and you are likely right that much of it comes from caching some of the larger structures.

@phraktle
Copy link
Contributor

I've exposed the Cache as a parameter on overloaded Reader constructors, defaulting to no caching.

The immutability aspect is a bit problematic, due to flaws in the Jackson API design. I have opened a ticket with them FasterXML/jackson-databind#1048, but we'll obviously need a workaround. For ObjectNodes, we're okay (pass in immutable Map in constructor). The problem is with ArrayNodes that have no analog.

(Of course the sane approach would be not to build on a JSON API, but use plain structures as suggested above, which would pose no such problems).

@oschwald
Copy link
Member

Let's see if they respond in a timely fashion.

In regards to not using Jackson in this API, we might entertain the idea assuming it doesn't cause any issues for the higher level GeoIP2 API. Using Objects everywhere is a somewhat ugly, but perhaps the best solution given the lack of sum types in Java. The bits of Jackson that we use are not very JSON specific.

@phraktle
Copy link
Contributor

Looks like a small change will be released in the upcoming Jackson 2.7 to support immutability. I have also asked for a backport to 2.6.x, since 2.7 is Java7 only and it seems the current GeoIP library target is Java6. Given that JDK6 has been end-of-lifed 3 years ago, I think it's reasonable to raise the GeoIP target to Java7 as well (which could also improve the codebase, e.g. type inference).

@phraktle
Copy link
Contributor

While we work out the issues related to caching, could you provide a release of the geoip APIs with the small tweaks I have submitted? These yield a solid ~20% overall performance improvement and a significant reduction in GC, so it should be useful for all users of the library.

@oschwald
Copy link
Member

Do you need a release of the GeoIP2 API as well this API? We are in the midst of several changes there and are not quite ready to do the next release. We could, however, do snapshot release of both if that is sufficient for your needs. (Or you could just force the version of this API to use it with the current GeoIP2 API.)

@phraktle
Copy link
Contributor

The last option (forcing a higher revision of this dependency) sounds fine for now, thanks! This way you can just release a new db reader artifact and update geoip2 at your desired pace.

@oschwald
Copy link
Member

@phraktle, I did a release of maxmind-db yesterday. It should be on most mirrors by now.

@phraktle
Copy link
Contributor

Thanks, I can confirm it works. Mean latency for a geoip lookup in production went from 50μs down to 40μs.

@andrewachen
Copy link

Guava has a caching system that handles eviction and expirations. Perhaps it's worth using that?

https://github.com/google/guava/wiki/CachesExplained

@phraktle
Copy link
Contributor

Hi @andrewachen, I've designed with a pluggable caching interface, so one could plug in other types of caches. But I don't think the core library should have large external dependencies like this (even though I'm a happy user of Guava in general). Also, in practice a small and simple cache with no eviction performs great: just caching the first thousand items achieves a 95% hit rate in random geoip lookups. So there's no real benefit in a more complicated cache.

@phraktle
Copy link
Contributor

Hi @oschwald,

Looks like making the JSON nodes immutable would break the higher level maxmind GeoIP API, since it's actually trying to mutate the node, adding traits and ip address. Any chance you could revise that, so it does not expect mutable nodes? (this will block releasing the lower level API too).

Also, the whole JSON serialization is going to be a bottleneck. It already is: the lower level API can do 78k qps while the higher level API ~46k qps. With the caching in place, lower level is ~450k qps, while the higher level less than 100k.

Thanks,
Viktor

@oschwald
Copy link
Member

@phraktle, I briefly looked into this, but it turns out to be a bit complicated. The obvious way to do it would be to use Jackson's value injection. This works fine for the database reader, but it does not work for the web service, where the IP address is returned from the web service. Unfortunately, I don't see an obvious way for more normal deserialization and injection to co-existing for the same parameter. There are possible workarounds, but none of them seem particularly pretty.

For the .NET reader, I am toying with having the MaxMind.Db API unmarshall to the model classes directly, similar to how the Go reader works. This also has the advantage that you can easily cache the final model objects directly.

@phraktle
Copy link
Contributor

phraktle commented Jan 1, 2016

Submitted a PR for caching that strikes a middle ground by cloning mutable container nodes. This is not ideal, but still a major improvement over no caching (~250k qps vs ~80k qps).

@phraktle
Copy link
Contributor

phraktle commented Jan 1, 2016

@oschwald, yes, binding directly to model objects would make sense. The tricky part is that the database seems completely generic and has a deep model (i.e. nested objects), so the binding would have to involve reflection / dynamic class instantiation with awareness of collection types (which would need annotations, given the type erasure in Java). I suppose it's manageable, but it sounds a bit elaborate for my taste.

Perhaps a simpler direction would be to have a well-defined schema in the database and then it's easy to have the corresponding model objects (i.e. the database structure specifies the object type explicitly, instead of everything being a generic Map type). But this is clearly needs a format change (though maybe there's a way to add in the type information in a backwards compatible way).

I guess in the meantime, it would very helpful if you could ensure the higher-level geoip2 API does not perform mutation on the nodes returned from the db API (though it could still mutate the model objects after mapping). This way the DB API could avoid deep copies in caching.

@oschwald
Copy link
Member

oschwald commented Jan 1, 2016

Yes, type erasure would make this somewhat more complicated than with C# or Go, but it might be a worthwhile direction longer term. I think we could support a limited number of collection types without too much issue.

I generally agree that including a schema in the database metadata would be a big improvement. Although I think it could be done without breaking compatibility, it is not likely to happen in the near future due to the number of changes it would require in the generation and writing code.

Cloning the nodes seems like a good compromise for now. I'll take a look at that PR.

@phraktle
Copy link
Contributor

phraktle commented Jan 2, 2016

Thanks for merging the caching change. If you cut a release of the db API, I can also go ahead and update the GeoIP API to allow caching. (Do you think it should expose passing in an arbitrary cache implementation, or just make caching a boolean option?)

@phraktle
Copy link
Contributor

phraktle commented Jan 2, 2016

Btw, another potential optimization opportunity in the decoder is that short strings are stored in-line in the database (rather than as a pointer – the cache took care of those). These could be cached in a prefix tree for lookups with zero allocation, rather than doing a String decode.

@oschwald
Copy link
Member

oschwald commented Jan 2, 2016

I'll try to get a release out on Monday. In regards to the caching, I lean slightly towards allowing the passing of an arbitrary implementation.

Are you seeing evidence of the inlined keys causing slowdowns in the profiler? I wonder how much we could gain by eliminating those allocations.

@oschwald
Copy link
Member

oschwald commented Jan 2, 2016

I did a snapshot release for now (1.0.2-SNAPSHOT). If you wanted to base your GeoIP2 on that, I could probably do both releases together.

@phraktle
Copy link
Contributor

phraktle commented Jan 3, 2016

Based on a simple mock implementation, doing lookups for short (2 character) inline strings would reduce allocations by ~10% (that's roughly half of all String allocations), and would improve runtime by around 5%. In any case, the current bottleneck is the excessive object cloning in the node cache – eliminating that would almost double the throughput. After that, eliminating the Result return values would be next on the bottleneck list. (But even the current cloning cache approach is a huge reduction in garbage compared to no caching, since the ValueNodes and Strings don't get copied).

@phraktle
Copy link
Contributor

phraktle commented Jan 3, 2016

Submitted maxmind/GeoIP2-java#55

@oschwald
Copy link
Member

oschwald commented Jan 4, 2016

Thanks! I release a new version of maxmind-db and geoip2 with the caching.

@phraktle
Copy link
Contributor

phraktle commented Jan 6, 2016

Thanks! With the new release (and caching enabled), mean latency for a lookup via the GeoIP2 API in production went down from 40μs to 30μs.

@phraktle
Copy link
Contributor

Hi @oschwald,

What do you think about making the ipAddress fields in the GeoIP2 API mutable, so it can be set after the JSON transformation instead of mutating the JSON tree returned from the DB API?

Thanks,
Viktor

@oschwald
Copy link
Member

Although I don't love the idea and would prefer to deserialize to the objects directly, I think this could be ok as long as the setters were package-private. I don't think setters should be added to the public API.

@phraktle
Copy link
Contributor

Ok. Another approach might be to customize the Jackson deserialization process, so that it injects this property without having to change the tree model. I have not looked into this yet, I just have a hunch that Jackson might be flexible enough for such things.

@oschwald
Copy link
Member

Yeah, customizing the Jackson deserialization process would be ideal. I too suspect that it is doable with Jackson.

@phraktle
Copy link
Contributor

With these changes, object churn is much improved. Further optimizations of the DB API provide diminishing returns, since the bottleneck is now the JSON tree to Object deserialization (i.e. the higher-level GeoIP2 API). Based on profiling, it's mostly due to having to construct HashMaps for the localized strings in the object model. In any case, my take is that for meaningful further improvements, a larger redesign would need to happen (i.e. an improved object model, direct deserialization instead of interim JSON tree, etc).

@oschwald
Copy link
Member

Thanks for all your work on this! I agree with your conclusions. I am going to close this particular issue for those reasons. New, specific issues can be opened as appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants