streaming: split event buffer by key #12080

boxofrad · 2022-01-14T11:41:27Z

Summary

This is an improvement to Consul's streaming backend to address performance problems observed in cases of high parallelism - i.e. many subscribers, high service churn, and servers with a large number of CPU cores.

It reduces lock-contention in the kernel's futex subsystem (more on that below) by only waking subscribers when there is an update that is relevant to them.

Background

Events are published to subscribers by appending them to a data structure called the eventBuffer. It is a single-writer, multiple-reader, linked list in which each element contains a channel that is used to notify subscribers when the subsequent element is available (similar to a monitor/condition variable). Subscribers block on a receive from this channel until it is closed by the publisher — this pattern will be familiar if you've used the context package (particularly the Done method).

Currently, there is a single eventBuffer for the entire topic. In other words, subscribers will be woken whenever any health event occurs, regardless of whether or not it pertains to a service they're watching, and it's up to them to ignore events that they're not interested in.

On a busy cluster with many services and open streams, when an event is published, the majority of scheduled goroutines have no real work to do, and will immediately attempt to receive/block on the next element’s channel.

As well as the general overhead of scheduling these goroutines and spending a few CPU cycles on checking for relevant events, there's a much larger cost incurred by them all attempting to become receivers on the new channel at the same time.

Internally, channels maintain a lock that is held at various points, including when adding a blocked receiver; therefore, the many subscriber goroutines will compete to acquire the lock, resulting in contention.

There is some additional nuance here too, though. It turns out that reducing the contention of this particular lock - e.g. by sharding the same large number of subscribers across many channels so they're not all competing for the same lock - doesn't actually solve the problem. 🤔

On Linux, the Go runtime's internal locks are backed by futexes. The kernel stores futexes in a hashmap in which each bucket is guarded by a spinlock. The contention we're observing doesn't go away when we shard the channel because it isn't contention on the individual channel's lock, but rather the kernel's own internal (semi-global) lock! This explains why we have observed such a large amount of CPU time spent in native_queued_spin_lock_slowpath, and why the graph below shows such a high proportion of CPU time spent in kernel-space.

The downstream impact of this contention is that event publishing gets slower, and with it so does applying changes to the Raft state machine, which has a massive effect on the general performance of a server. As an example, sustained FSM Apply latency of just 20ms, when Raft's internal queues/buffers are full, can result in several minutes of RPC latency.

By splitting the topicBuffer by key (e.g. service name) we can limit the number of goroutines that are woken by an event to just those with "real" work to do — which likely also adds an element of jitter as the goroutine will need to wait on sending gRPC stream frames etc.

Naturally, this doesn't improve the pathological case where all subscribers are watching a single service. But in testing, I've found that this case doesn't exhibit the same issue, because the subscribers always have "real" work to do, and so won't immediately contend to re-acquire the channel lock.

Huge kudos to @banks, @mkeeler, @dnephin et al. for all of the incredible research work on this issue 👏🏻 ⭐

Benchmark

To test the efficacy of this change, I reused the test harness from when we last compared the performance of the streaming backend to that of blocking queries.

I deployed a single server node on an m6i.32xlarge EC2 instance (128 CPU cores). I then ran the benchmark tool on 50 worker nodes, each creating 14 gRPC connections to the server and consuming 100 streams per connection (for a total of 70k concurrent subscriptions). The tool then began generating write load by updating random tags on the watched services at a rate of 50Hz.

Results

The results below show a significant reduction in "system" CPU time, lower FSM Apply time, and crucially less time spent waiting in Raft's internal queues/buffers (leading to pathologically bad RPC latency).

This is a pre-requisite for splitting the internal topic buffer by key. While we technically could support subscribing to all of a topic's events by passing an empty key, it would require publishing events on multiple buffers (a key-specific and a global buffer) and as there isn't a compelling real world use-case for this, it doesn't seem worth the extra complexity.

banks

Amazing work @boxofrad 👏

I left a few inline comments, mostly notes to myself or random thoughts that don't need changes. There's one or two small things I think it would be nice to change though including fixing the racey map accesses in tests that presumably fail race detector?

The only other one that seems important to consider is whether we can simplify the code (and reasoning about correctness although I think it is correct) and have the bookkeeping function happen atomically with the unsubscribe state being updated.

agent/consul/stream/event.go

agent/consul/stream/event_publisher.go

agent/consul/stream/event_publisher_test.go

agent/consul/stream/topic_key.go

proto/pbsubscribe/subscribe.proto

agent/consul/stream/topic_key.go

agent/consul/stream/event.go

dnephin

Great work, and excellent detail in the description!

I've only done a quick review so far. Mostly responding to the questions and commenting on stylistic things like TopicKey. I'll hopefully have a chance to read over more of the functional changes in the last two commits next week.

agent/consul/stream/event_publisher.go

agent/consul/stream/event.go

boxofrad · 2022-01-17T16:59:13Z

Thanks for the reviews folks! 🙇🏻‍♂️

@banks, @dnephin: I think I've addressed (or at least responded to) your comments, but do let me know if I've missed anything.

agent/consul/state/catalog_events_test.go

banks

I think all my comments were addressed here so LGTM! I think Daniel said he'd not reviewed in depth yet so I'd wait for an approval from him too!

dnephin

Looks great! A couple minor questions and suggestions, but nothing blocking a merge.

agent/rpc/subscribe/subscribe_test.go

agent/consul/stream/subscription.go

agent/consul/stream/subscription_test.go

agent/consul/stream/event_publisher_test.go

hc-github-team-consul-core · 2022-01-28T12:27:44Z

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/569544.

hc-github-team-consul-core · 2022-01-28T12:27:47Z

🍒✅ Cherry pick of commit fdfe079 onto release/1.11.x succeeded!

)

boxofrad added 4 commits January 11, 2022 16:37

streaming: Add new TopicKey type

cb97b69

streaming: split topic buffer by key

f57f575

streaming: clean up old buffers when there are no subscribers

fb5a65f

boxofrad requested review from banks, dnephin and a team January 14, 2022 11:41

banks requested changes Jan 14, 2022

View reviewed changes

dnephin reviewed Jan 14, 2022

View reviewed changes

agent/consul/stream/topic_key.go Outdated Show resolved Hide resolved

agent/consul/stream/event.go Show resolved Hide resolved

dnephin reviewed Jan 14, 2022

View reviewed changes

agent/consul/stream/event_publisher.go Outdated Show resolved Hide resolved

agent/consul/stream/event.go Outdated Show resolved Hide resolved

boxofrad added 7 commits January 17, 2022 14:50

streaming: make closeSubscriptionPayload.TopicKey consistent with others

c6d7565

streaming: replace TopicKey struct with opaque subject string

9185987

streaming: make eventBufs and snapCache maps single-dimensional

1268bbb

streaming: fix racey tests

bf05659

streaming: remove old/incorrect comment

c5dd004

streaming: accept an EnterpriseMeta in SubscribeRequest

5629b29

streaming: document the non-atomicity of unsubscribing

be66a02

vercel bot temporarily deployed to Preview – consul January 17, 2022 16:58 Inactive

vercel bot temporarily deployed to Preview – consul-ui-staging January 17, 2022 16:58 Inactive

boxofrad requested review from dnephin and banks January 17, 2022 16:59

boxofrad commented Jan 17, 2022

View reviewed changes

agent/consul/state/catalog_events_test.go Show resolved Hide resolved

streaming: remove unnecessary inner map in publish

cd6bb4b

vercel bot temporarily deployed to Preview – consul January 17, 2022 17:12 Inactive

vercel bot temporarily deployed to Preview – consul-ui-staging January 17, 2022 17:12 Inactive

banks approved these changes Jan 17, 2022

View reviewed changes

streaming: add test cases for event overrides

01c564e

vercel bot temporarily deployed to Preview – consul-ui-staging January 17, 2022 17:21 Inactive

vercel bot temporarily deployed to Preview – consul January 17, 2022 17:21 Inactive

streaming: changelog entry

8426222

vercel bot temporarily deployed to Preview – consul-ui-staging January 17, 2022 19:47 Inactive

vercel bot temporarily deployed to Preview – consul January 17, 2022 19:47 Inactive

dnephin approved these changes Jan 20, 2022

View reviewed changes

agent/rpc/subscribe/subscribe_test.go Outdated Show resolved Hide resolved

agent/consul/stream/subscription.go Outdated Show resolved Hide resolved

agent/consul/stream/subscription_test.go Show resolved Hide resolved

agent/consul/stream/event_publisher_test.go Outdated Show resolved Hide resolved

streaming: stop using require.New in tests

d3d49a1

boxofrad added backport/1.11 pr/no-metrics-test labels Jan 28, 2022

vercel bot temporarily deployed to Preview – consul January 28, 2022 12:00 Inactive

vercel bot temporarily deployed to Preview – consul-ui-staging January 28, 2022 12:00 Inactive

boxofrad added 2 commits January 28, 2022 12:02

streaming: OrDefault methods already call ToLower

8337962

streaming: add simulateExistingSubscriber test helper

4a78287

boxofrad force-pushed the boxofrad/streaming-contention-split-buffer branch from 86efe0a to 4a78287 Compare January 28, 2022 12:02

vercel bot temporarily deployed to Preview – consul-ui-staging January 28, 2022 12:02 Inactive

vercel bot temporarily deployed to Preview – consul January 28, 2022 12:02 Inactive

boxofrad merged commit fdfe079 into main Jan 28, 2022

boxofrad deleted the boxofrad/streaming-contention-split-buffer branch January 28, 2022 12:27

hc-github-team-consul-core pushed a commit that referenced this pull request Jan 28, 2022

streaming: split event buffer by key (#12080)

31fd0d8

boxofrad added a commit that referenced this pull request Jan 28, 2022

streaming: split event buffer by key (backport #12080 to 1.10.x)

c5c7b0a

boxofrad mentioned this pull request Jan 28, 2022

streaming: split event buffer by key (backport #12080 to 1.10.x) #12220

Merged

boxofrad added a commit that referenced this pull request Jan 31, 2022

streaming: split event buffer by key (backport #12080 to 1.10.x) (#12220

b422f8b

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streaming: split event buffer by key #12080

streaming: split event buffer by key #12080

boxofrad commented Jan 14, 2022 •

edited

Loading

banks left a comment

dnephin left a comment

boxofrad commented Jan 17, 2022

banks left a comment

dnephin left a comment

hc-github-team-consul-core commented Jan 28, 2022

hc-github-team-consul-core commented Jan 28, 2022

streaming: split event buffer by key #12080

streaming: split event buffer by key #12080

Conversation

boxofrad commented Jan 14, 2022 • edited Loading

Summary

Background

Benchmark

Results

banks left a comment

Choose a reason for hiding this comment

dnephin left a comment

Choose a reason for hiding this comment

boxofrad commented Jan 17, 2022

banks left a comment

Choose a reason for hiding this comment

dnephin left a comment

Choose a reason for hiding this comment

hc-github-team-consul-core commented Jan 28, 2022

hc-github-team-consul-core commented Jan 28, 2022

boxofrad commented Jan 14, 2022 •

edited

Loading