Throughput benchmarks #424

snowp · 2021-04-20T15:52:30Z

Add benchmarks providing some baseline for performance. An example of this would be a producer that produces new snapshots at a rapid pace and seeing how long it takes for these changes to get sent to the client.

Should at the very least cover the new delta code (as it's the more performant protocol), but could also target sotw.

This would help us better understand the impact of larger changes (like #413) and the cost of per resource computation in delta.

@alecholmez

github-actions · 2021-05-20T16:48:01Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

alecholmez · 2021-05-26T02:22:10Z

@snowp We definitely want this, once the delta code lands I think it'd be worth while spending some time on really figuring out the performance gains of delta.

Just an FYI, I introduced this a while ago: #362 but it never got merged. Might be worth revisiting at some point.

alecholmez · 2021-06-25T15:06:25Z

I think I'm going to take this next, we need to test the performance of what we have now. Utilizing the integration test with something like pprof to scrape for statistics seems appealing.

snowp · 2021-06-25T15:09:34Z

Running on top the integration tests seems like a good start!

One thing to think about beyond that is how performance would differ under different load patterns: lots of streams against the same type, lots of streams against an opaque type, lots of streams with different types, etc. and how the types of updates that are happening affect the stream (what % of streams are getting updates, the ADS case of multiple resources being sent over the same stream etc.)

alecholmez · 2021-06-25T15:15:15Z

Excellent, myself and @dougfort might iterate on a design doc to see what we can come up with. We'll see what we can capture, theres a gold mine of data here

alecholmez · 2021-06-29T15:35:00Z

@snowp if we intend to utilize the integration tests for what we want, is it safe to assume we don't intend on doing any long term storage of this throughput data? I was iterating design with a few co-workers and we were curious as to what you think is a good fit for this situation:

A one-time executable benchmark that simply outputs formatted data for users of this repo to consume
Or a complex system that holds data over time

We initially sketched out a mix of 1 and 2 but we don't want to creep the scope here to something that is unnecessary for the data we want to collect.

Here's a quick proposal we drafted up. It just sketches out the concepts of each component of the benchmark. We can go back and add technical artifacts to that later.

With a code example of the producer we talk about.

I think we want a system that will allow us to at least run pprof so we can see what's going on at runtime. The go benchmarking framework is limited as we've come to learn. If we do end up going with a system that complex it might even be worth separating from the integration test as a whole and having some sort of standalone test under pkg/test/benchmark.

alecholmez · 2021-06-29T15:40:18Z

Want to note that eventually we should use this benchmark to test code change too: #451

alecholmez · 2021-08-02T22:13:15Z

Just FYI this PR enables profiling for cpu, lock contention (block), mutex switching, go routine spawning and memory usage. It currently runs off the integration test but to test throughput I want to build a test client that puts the server through its paces. Once that's done I can profile that code and collect runtime data so we can have some throughput numbers

hiromis · 2021-08-24T21:20:08Z

Hi @snowp, @alecholmez,

I am new to the go-control-plane and trying to gauge the scope of this issue. I see that Alec has a PR up for benchmarking integration tests. What can I do to help complete this issue?

I was thinking maybe I can add benchmarking to SetSnapshot function to measure "the cost of per resource computation" that's mentioned in the original description. Would that be enough to be the first iteration?

As for #424 (comment), what did you guys have in mind exactly? I am looking at internal/example as well as the integration tests and wondering whether having just one proxy running to consume updates suffice or were you guys thinking of standing up an environment with multiple proxies with realistic snapshots?

Do let me know how I can contribute!

alecholmez · 2021-09-01T15:38:53Z

@snowp any input here? Hiromi has found that the go benchmarks aren't particularly accurate for detecting throughput here, did you have in mind a separate framework that we might need to build out to measure this? I guess we're just looking for some clarification here

snowp · 2021-09-01T16:08:36Z

My original thought was to have a system where we simulate continuous updates to the cache and understand how long it takes for these updates to make it to clients under increasing rate of change. Benchmarking small pieces of the code might be beneficial as microbenchmarks that can be optimized independently, but in order to understand the actual throughput of the system we probably need something end to end.

snowp added area/delta_xds perf labels Apr 20, 2021

github-actions bot added the stale label May 20, 2021

snowp added help wanted and removed stale labels May 20, 2021

alecholmez closed this as completed Jun 29, 2021

alecholmez reopened this Jun 29, 2021

alecholmez added the no stalebot label Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throughput benchmarks #424

Throughput benchmarks #424

snowp commented Apr 20, 2021

github-actions bot commented May 20, 2021

alecholmez commented May 26, 2021

alecholmez commented Jun 25, 2021

snowp commented Jun 25, 2021

alecholmez commented Jun 25, 2021

alecholmez commented Jun 29, 2021 •

edited

Loading

alecholmez commented Jun 29, 2021

alecholmez commented Aug 2, 2021

hiromis commented Aug 24, 2021

alecholmez commented Sep 1, 2021

snowp commented Sep 1, 2021

Throughput benchmarks #424

Throughput benchmarks #424

Comments

snowp commented Apr 20, 2021

github-actions bot commented May 20, 2021

alecholmez commented May 26, 2021

alecholmez commented Jun 25, 2021

snowp commented Jun 25, 2021

alecholmez commented Jun 25, 2021

alecholmez commented Jun 29, 2021 • edited Loading

alecholmez commented Jun 29, 2021

alecholmez commented Aug 2, 2021

hiromis commented Aug 24, 2021

alecholmez commented Sep 1, 2021

snowp commented Sep 1, 2021

alecholmez commented Jun 29, 2021 •

edited

Loading