-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive allocations ExtraFactory.decorate #1421
Comments
hmm, so sounds like there are a lot of modifications to "extra" after the initial request, huh.. what's causing that? do you feel up to optimizing this? maybe improve or write an alternate to BaggagePropagationBenchmarks and show the before and after? note: we'll need to stay at bytecode level 6. Also, because use of extra is often one item, let's try not to initialize a default of a larger array. Basically, try your best to keep those using static or rarely modified baggage fast pathed. |
@codefromthecrypt when I look into it, we aren't adding new extra instances here, it's typically 3 items from the root span. The issue is when creating a child span, the data has to be copied over. It's possible we could just disallow mutations in child spans, I would need to explore that, as I'm not sure it's even required. However, it still seems to be a problem for the case where mutation is needed. |
ok, let's step back from the internal type. The only public type is Can you add a benchmark that represents what you are doing to While possible to do In other words it isn't typical to change the amount of "extras" in a context. Maybe you can more specific about why this is done (what framework or exact use case, and why this ends up using In summary, to progress this, please show exactly what is going on, the ultimate use case, what code is doing this (is it in spring?). We have to optimize for the majority of people and sometimes that means changing how something is used instead of making everyone have to allocate or adding risk to one of the most easy to break parts of tracing (propagation) |
Before this change, notes about state, particularly the "one instance per plugin" rule were not consolidated on the method that can cause problems `addExtra`. This migrates notes together with additional warnings that when we talk about this being used for propagation plugins, we also mean not for ad-hoc usage, as that can lead to excessive overhead. See #1421 Signed-off-by: Adrian Cole <adrian@tetrate.io>
meanwhile I improved docs as I noticed that implications of "extra" were not easy to find in one place, so I consolidated and clarified existing notes #1424 |
So far, what I am guessing is happening is something is using our internal NEXT STEP: Identify what code is in the stack that ends up calling decorate many times. If you are using spring, please fork an example to reproduce what's going on. |
Before this change, notes about state, particularly the "one instance per plugin" rule were not consolidated on the method that can cause problems `addExtra`. This migrates notes together with additional warnings that when we talk about this being used for propagation plugins, we also mean not for ad-hoc usage, as that can lead to excessive overhead. See #1421 Signed-off-by: Adrian Cole <adrian@tetrate.io>
We aren't doing that directly, that is happening in ExtraFactory decorate, because of how the code is written to copy the extra list.
This is internal code using Spring Boot + Micrometer, with a custom
That BaggageFields.Extra instance gets added because we are using remote propagation for the baggage field, e.g. Since that secondary sampling example you linked to most closely resembles what our internal code is doing, here is an example that reproduces the issue: Propagation.Factory baggagePropagation = BaggagePropagation.newFactoryBuilder(B3Propagation.newFactoryBuilder().build())
.add(BaggagePropagationConfig.SingleBaggageField.remote(BaggageField.create("foo")))
.build();
SecondarySampling sampler = SecondarySampling.newBuilder()
.propagationFactory(baggagePropagation)
.secondarySampler((o, req) -> true)
.build();
Tracing tracing = Tracing.newBuilder()
.propagationFactory(sampler)
.build();
HttpServerRequest request = new HttpServerRequest() {
@Override
public String method() {
return "POST";
}
@Override
public String path() {
return "/foo";
}
@Override
public String url() {
return "http://example.com/foo";
}
@Override
public String header(String name) {
return null;
}
@Override
public Object unwrap() {
return this;
}
};
TraceContextOrSamplingFlags flags = tracing.propagation().extractor(HttpServerRequest::header)
.extract(request);
Span span = tracing.tracer().nextSpan(flags);
tracing.tracer().newChild(span.context()); // extras copied ..
tracing.tracer().newChild(span.context()); // extras copied .. |
thanks for the context. I think the best next step is to choose the webflux example or similar with the exact versions of micrometer you are using because I know there is some inefficient code involved. We need to put things into perspective, and that's why I suggest eventually getting to JMH as you may find these allocations not in any way dominating, as other instrumentation is causing excessive overhead. For example, our b3 doesn't allocate as much as most, and when we are talking about allocation about propagation and fixing a small thing, but not a big thing.. well makes less sense. NEXT STEP: can you make a fork or copy of brave-example and reproduce what you are doing and the versions of boot and micrometer involved. We need to be precise if we care about this, and I personally will only help if this is done. |
FWIW, what I am concerned with is that it has something to do with this https://github.com/spring-projects/spring-boot/blob/1a907bf47e1dbcc3b5c8e41b47b9b40bbc42740b/spring-boot-project/spring-boot-actuator-autoconfigure/src/main/java/org/springframework/boot/actuate/autoconfigure/tracing/CompositePropagationFactory.java#L78 Spring have paid employees, rather than rely on volunteer time we should get together the actual scenario. that's the whole reason the brave-example project was made (to conserve volunteer time) |
fyi for your own experiments, here's how we have measured this in the past
All of this combines with other things like propagation, and also the cardinality of "extra" is very important. For example, if we measure creation of an array without being in context of other things, it is hard to prioritize change. It is also hard to make the case to add overhead to everyone. Finally, it may be that what's being done could be scoped differently such that no extras are copied at all! To get a gist of how we conserved time with benchmarks, you can do something like this in EndToEndBenchmarks
then run it. I usually pick p99 and gc.alloc.rate.norm. Then, for example you can see that using baggage how we did indeed has overhead (lots of warnings about this) and in the controlled example, the allocation goes up by a bit over 5pct.
This will be extremely different in boot for reasons mentioned including how it operates a lot differently. Once things are put together realistically, first thing I would ask is to supply your own bean of propagation factory. Then, I would ask if your example is scoped exactly how it is used (data added to your custom thing is indeed mutated vs being a constant the whole request). Finally, similar to the secondary-sampling example, if there is anything that produces a cardinality change in extra, especially if that is dominating. At most it should be a couple element array, and again if there is no reason to copy we can talk about alternatives vs doing any copy at all. This is why we need to get exactly what you are doing down and in context. I have lost too much time shooting in the dark in the past, and that's why I'm asking you to not model abstractly, but implement exactly what you are doing and we can work from there. If your propagation thing is a trade secret, replicate its interactions without exposing your business logic, in such a way that it creates the same amount of data in the same way per request. hope this helps! |
also added this so you can see how changes might affect others. There may be a way to improve this without affecting others, and as always a change is welcome to that regard. Just, sacrificing others requires a lot of rationale, particularly in places we have added a lot of benchmark work. #1425 |
I frequently see ExtraFactory.decorate at the top of the list of allocation hotspots when I profile our services. Here is an example from a 1 minute profile:
It looks to me like this is caused by excessive resizing of the extras ArrayList. If we look at ExtraFactory.decorate, I believe this happens because of this line:
And then the subsequent loop that adds Extra instances one at a time; it's an interaction between the above code and Lists.ensureMutable; we always pass
Collections.emptyList()
to Lists.ensureMutable, which causes the ArrayList to get initialized with an initialCapacity of 0. Due to how Java's ArrayList calculates the new capacity when growing a list, for sizes this small we end up reallocating A LOT. Here is a simple example that demonstrates that:Basically, initializing an ArrayList with a small initial capacity (such as zero), and then adding items one-by-one will trigger excessive copying, since the first 5 element insertions trigger a resize. At that point we'e allocated in total 5 arrays, with a combined capacity of 15; we would have been better off initializing the original ArrayList with more headroom. One approach is to do just that in Lists.ensureMutable:
Another option is to try to size the list more accurately in
ExtraFactory.decorate
, by usingextraSize
or something similar.The text was updated successfully, but these errors were encountered: