Use MPMC bounded queue for group freelist #1472

eugeneia · 2022-03-07T13:41:23Z

This adresses

Group freelist contention #1468

by employing a smarter data structure for the group freelist (MPMC bounded queue).

It also moves the rebalancing house keeping into packet.free, and keeps limits on the upper bound of work performed in any rebalance/reclaim step.

In the density plots below we have master in red and this branch in blue (green was an alternative wip branch).

This certainly improves latency and thus performance of interlinks, however this does not make interlinks scale without restriction. The benchmarking I did seems to show that sharing memory between cores still turns into a bottleneck, and depending on your CPU architecture you are going to run into that sooner or later. I did compare results between Intel and EPYC machines and they are quite different. But for here and now I’m going to focus only on EPYC as an example.

In the plot above we compare latencies by number of receivers for a single transmitter, and we see a significant blowup of latencies after 1 transmitter + >5 receivers. Now why is that?

If we look at the topology of our CPU as reported by AMDuprof we can get a hint:

CPU Topology:
Socket, CCD, Core(s)
0,0, 0 1 2 3 4 5
0,1, 6 7 8 9 10 11
0,2, 12 13 14 15 16 17
0,3, 18 19 20 21 22 23

Each CCD spans six CPU cores. So while our workload fits a single CCD we get OK perf (~60Mpps) but as soon as we add a receiver running on a distinct CCD latency and perf tanks (~10Mpps).

The above diagram of the CPUs architecture/topology gives some hints. So each CCD houses two CCXs, and a cores in a CCX share a L3 cache. I am assuming the CCXs in a CCD also have faster interconnects to each other than to a CCX in a remote CCD?

Anyways if I look at some PMU counters using AMDuprof we can maybe see why a workload distributed across CCDs fares worse (take this with some salt, this is me reading the tea leaves):

workloads that fit a single CCD can fetch data from shared L3 or other L2 caches in the same CCD or CCX? DCFillsFromL3orDiffL2 is higher
they incur less L2DtlbMisses, and have to perform less DCFillsFromLocalMemory
hence they can retire more instructions per cycle

cribbed from 1024core.net rebalance in free

This is is needed so that RaptorJIT doesn't hoist the second load of enqueue_pos/dequeue_pos out of the loop.

…ackets

eugeneia added 11 commits February 15, 2022 15:49

apps.interlink: add instrumented latency benchmark

f799af2

core.packet: factor packet reclaimation from group_fl into reclaim_step

fbc13ab

apps.interlink: better instrumentation

a9c8436

core.packet: lock-free group freelist

3b32a2f

core.group_freelist: cache head_add/tail_remove, document

e1a70b0

core.packet: better group freelist design

c0ab1b1

cribbed from 1024core.net rebalance in free

apps.interlink: fix bugs in wait_test.snabb

cdec562

core.group_freelist: add missing load barrier

d01df58

This is is needed so that RaptorJIT doesn't hoist the second load of enqueue_pos/dequeue_pos out of the loop.

core.group_freelist: shrink queue

99e2a6c

apps.interlink: fremove obsolete line wait_test.snabb

2ec3a89

apps.interlink: sort cores in wait_test.snabb

cab614d

eugeneia added bug enhancement and removed bug labels Mar 7, 2022

eugeneia added 2 commits March 23, 2022 12:53

core.packet: ensure freelist has enough space for group fl rebalancing

84cb61c

core.group_freelist: enlarge group_freelist to hold up to 2 million p…

616e6e2

…ackets

eugeneia added the merged label Apr 7, 2022

eugeneia mentioned this pull request Jun 20, 2022

Collected changes for July 2022 release: Parmigiano #1478

Merged

eugeneia added the release-parmigiano label Jun 20, 2022

eugeneia linked an issue Jun 22, 2022 that may be closed by this pull request

Group freelist contention #1468

Closed

eugeneia merged commit 2ff0924 into snabbco:master Jul 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use MPMC bounded queue for group freelist #1472

Use MPMC bounded queue for group freelist #1472

eugeneia commented Mar 7, 2022

Use MPMC bounded queue for group freelist #1472

Use MPMC bounded queue for group freelist #1472

Conversation

eugeneia commented Mar 7, 2022