Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add softirq-latency example #300

Merged
merged 1 commit into from
Oct 18, 2023
Merged

Conversation

bobrik
Copy link
Contributor

@bobrik bobrik commented Oct 16, 2023

Example for NET_RX:

ivan@vm:~$ curl -s http://ip6-localhost:9435/metrics | fgrep softirq | fgrep NET_RX_SOFTIRQ
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1e-06"} 165
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2e-06"} 167
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="4e-06"} 176
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="8e-06"} 271
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.6e-05"} 406
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="3.2e-05"} 452
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="6.4e-05"} 468
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000128"} 477
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000256"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000512"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.001024"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.002048"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.004096"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.008192"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.016384"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.032768"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.065536"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.131072"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.262144"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.524288"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.048576"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2.097152"} 480
ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="+Inf"} 480
ebpf_exporter_softirq_entry_latency_seconds_sum{kind="NET_RX_SOFTIRQ"} 0.010152
ebpf_exporter_softirq_entry_latency_seconds_count{kind="NET_RX_SOFTIRQ"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1e-06"} 42
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2e-06"} 77
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="4e-06"} 140
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="8e-06"} 191
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.6e-05"} 276
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="3.2e-05"} 375
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="6.4e-05"} 422
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000128"} 471
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000256"} 477
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000512"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.001024"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.002048"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.004096"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.008192"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.016384"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.032768"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.065536"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.131072"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.262144"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.524288"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.048576"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2.097152"} 480
ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="+Inf"} 480
ebpf_exporter_softirq_service_latency_seconds_sum{kind="NET_RX_SOFTIRQ"} 0.024145
ebpf_exporter_softirq_service_latency_seconds_count{kind="NET_RX_SOFTIRQ"} 480

cc @netoptimizer

@bobrik
Copy link
Contributor Author

bobrik commented Oct 16, 2023

This one might be a bit expensive to run in production:

image

examples/softirq-latency.bpf.c Outdated Show resolved Hide resolved
examples/softirq-latency.bpf.c Outdated Show resolved Hide resolved
examples/softirq-latency.bpf.c Outdated Show resolved Hide resolved
@bobrik
Copy link
Contributor Author

bobrik commented Oct 17, 2023

Here's how the overhead looks on a flamegraph (-e cycles):

  • Raise (1005 samples):
image
  • Entry (1498 samples) + Exit (646 samples):
image

A bit weird that percpu stuff would involve so much locking.

read_array_ptr(&softirq_raise_timestamp, &vec_nr, raise_ts_ptr);

// Interrupt was re-rased after ts was obtained, resulting in negative duration
if (*raise_ts_ptr > ts) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens mostly for TASKLET_SOFTIRQ and sometimes for TASKLET_SOFTIRQ:

<...>-27627   [043] d.s3. 46693.775555: bpf_trace_printk: re-raise after 1500ns [6]
          <idle>-0       [035] d.s4. 46693.884661: bpf_trace_printk: re-raise after 2040ns [6]
          <idle>-0       [061] d.s4. 46694.595380: bpf_trace_printk: re-raise after 1650ns [6]
          <idle>-0       [047] d.s4. 46694.922268: bpf_trace_printk: re-raise after 1650ns [6]
          <idle>-0       [001] d.s4. 46694.942947: bpf_trace_printk: re-raise after 1120ns [6]
           <...>-1445425 [041] d.s4. 46694.951062: bpf_trace_printk: re-raise after 1450ns [6]
          <idle>-0       [071] d.s4. 46695.262426: bpf_trace_printk: re-raise after 840ns [7]
 conduit-watcher-2620846 [039] d.s3. 46695.998234: bpf_trace_printk: re-raise after 1430ns [6]
          <idle>-0       [003] d.s4. 46696.024913: bpf_trace_printk: re-raise after 2090ns [6]
          <idle>-0       [029] d.s4. 46696.244701: bpf_trace_printk: re-raise after 1150ns [6]

The working theory is that there can be a raise after entry started but before this condition is checked.

There aren't too many of these, so we mostly ignore them.

@bobrik
Copy link
Contributor Author

bobrik commented Oct 17, 2023

Before (flamegraphs above):

image

After (still a lot):

image

Flamegraph for entry + exit:

image

Flamegraph for raise:

image

Example for NET_RX:

    $ curl -s http://metrics-interface:3141/metrics | fgrep NET_RX_SOFTIRQ
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1e-06"} 7.285406e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2e-06"} 7.795892e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="4e-06"} 8.007842e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="8e-06"} 8.151191e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.6e-05"} 8.230504e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="3.2e-05"} 8.251142e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="6.4e-05"} 8.259189e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000128"} 8.261051e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000256"} 8.261677e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000512"} 8.262088e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.001024"} 8.262267e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.002048"} 8.262433e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.004096"} 8.262572e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.008192"} 8.262683e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.016384"} 8.262707e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.032768"} 8.262708e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.065536"} 8.262708e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.131072"} 8.262708e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.262144"} 8.262708e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.524288"} 8.262708e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.048576"} 8.262708e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2.097152"} 8.262708e+06
    ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="+Inf"} 8.262708e+06
    ebpf_exporter_softirq_entry_latency_seconds_sum{kind="NET_RX_SOFTIRQ"} 16.282414
    ebpf_exporter_softirq_entry_latency_seconds_count{kind="NET_RX_SOFTIRQ"} 8.262708e+06
    ebpf_exporter_softirq_raised_total{kind="NET_RX_SOFTIRQ"} 8.275197e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1e-06"} 226218
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2e-06"} 1.643431e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="4e-06"} 4.141819e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="8e-06"} 5.646346e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.6e-05"} 6.952278e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="3.2e-05"} 8.055987e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="6.4e-05"} 8.239492e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000128"} 8.262601e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000256"} 8.266388e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000512"} 8.267436e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.001024"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.002048"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.004096"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.008192"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.016384"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.032768"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.065536"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.131072"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.262144"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.524288"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.048576"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2.097152"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="+Inf"} 8.267449e+06
    ebpf_exporter_softirq_service_latency_seconds_sum{kind="NET_RX_SOFTIRQ"} 130.717583
    ebpf_exporter_softirq_service_latency_seconds_count{kind="NET_RX_SOFTIRQ"} 8.267449e+06
    ebpf_exporter_softirq_serviced_total{kind="NET_RX_SOFTIRQ"} 8.26212e+06
@bobrik bobrik merged commit b9de3a8 into cloudflare:master Oct 18, 2023
17 checks passed
@bobrik bobrik deleted the ivan/softirq-latency branch October 18, 2023 03:32
@netoptimizer
Copy link
Contributor

IMHO we need to limit what softirq's that are getting traced.
Or else the overhead will be too high... E.g. we don't want to trace every timer.

In my bpftrace script I'm limiting this to vector 3 which is NET_RX_SOFTIRQ

P.s.
I'm reading this from my phone while on vacation in Malaga, so I might have missed if this is already being filtered.

@bobrik
Copy link
Contributor Author

bobrik commented Oct 20, 2023

This is an example, people are free to narrow it down to whatever softirq they are interested in.

On a machine that I'm using for testing, there are ~400k NET_RX raises per second.

The relative numbers of other interrupt kinds are as follows (Mellanox uses tasklets a lot):

$ curl -s http://metrics-interface:3141/metrics | fgrep softirq | grep '^ebpf_exporter_softirq_serviced_total' | sed -e 's/.*kind="//' -e 's/"}//' | awk '{ printf("%15d %s\n", $2, $1) }' | sort -rn
       56799842 TASKLET_SOFTIRQ
       56423626 NET_RX_SOFTIRQ
        8988991 NET_TX_SOFTIRQ
        6173510 RCU_SOFTIRQ
        5578855 SCHED_SOFTIRQ
        3385815 TIMER_SOFTIRQ
         182570 HRTIMER_SOFTIRQ
             34 BLOCK_SOFTIRQ
              0 IRQ_POLL_SOFTIRQ
              0 HI_SOFTIRQ

I tested 3 variants:

  • Hash based all softirq — what is in this PR
  • Hash based only NET_RX — with added check for only NET_RX
  • Array based only NET_RX — array based instead of hash based version, also only checking for NET_RX

The results are below:

Variant Raise CPU% Entry CPU% Exit CPU%
Hash based all softirq 14.2% 34.6% 57.0%
Hash based only NET_RX 10.0% 16.6% 40.3%
Array based only NET_RX 9.8% 10.3% 33.7%

Array based version wins, but it's the least flexible. Hash based version is slower, but it's more flexible and you can follow multiple softirq kinds easily.

Array based only NET_RX version
metrics:
  histograms:
    - name: softirq_net_rx_entry_latency_seconds
      help: Latency histogram for how long it takes from NET_RX softirq raise to entry
      bucket_type: exp2
      bucket_min: 0
      bucket_max: 21
      bucket_multiplier: 0.000001 # nano to seconds
      labels:
        - name: bucket
          size: 4
          decoders:
            - name: uint
    - name: softirq_net_rx_service_latency_seconds
      help: Latency histogram for how long it takes from NET_RX softirq entry to exit
      bucket_type: exp2
      bucket_min: 0
      bucket_max: 21
      bucket_multiplier: 0.000001 # nano to seconds
      labels:
        - name: bucket
          size: 4
          decoders:
            - name: uint
  counters:
    - name: softirq_net_rx_raised_total
      help: Total number of times NET_RX softirq were raised
    - name: softirq_net_rx_serviced_total
      help: Total number of times NET_RX softirq were serviced
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include "bits.bpf.h"
#include "maps.bpf.h"

// Loosely based on https://github.com/xdp-project/xdp-project/blob/master/areas/latency/softirq_net_latency.bt

// 20 buckets for latency, max range is 0.5s .. 1.0s
#define MAX_LATENCY_SLOT 21

#define check_net_rx(vec_nr)                                                                                           \
    if (vec_nr != NET_RX_SOFTIRQ) {                                                                                    \
        return 0;                                                                                                      \
    }

// We only use one index in the array for this
u32 net_rx_idx = 0;

struct softirq_latency_key_t {
    u32 bucket;
};

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, u32);
    __type(value, u64);
} softirq_net_rx_raised_total SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, u32);
    __type(value, u64);
} softirq_net_rx_serviced_total SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, u32);
    __type(value, u64);
} softirq_net_rx_raise_timestamp SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, u32);
    __type(value, u64);
} softirq_net_rx_entry_timestamp SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, MAX_LATENCY_SLOT + 2);
    __type(key, u32);
    __type(value, u64);
} softirq_net_rx_entry_latency_seconds SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, MAX_LATENCY_SLOT + 2);
    __type(key, u32);
    __type(value, u64);
} softirq_net_rx_service_latency_seconds SEC(".maps");

SEC("tp_btf/softirq_raise")
int BPF_PROG(softirq_raise, unsigned int vec_nr)
{
    u64 *existing_ts_ptr, *raised_total_ptr, ts;

    check_net_rx(vec_nr);

    ts = bpf_ktime_get_ns();

    read_array_ptr(&softirq_net_rx_raised_total, &net_rx_idx, raised_total_ptr);
    *raised_total_ptr += 1;

    read_array_ptr(&softirq_net_rx_raise_timestamp, &net_rx_idx, existing_ts_ptr);

    // Set the timestamp only if it is not set, so that we always measure the oldest non-entered raise
    if (!*existing_ts_ptr) {
        *existing_ts_ptr = ts;
    }

    return 0;
}

SEC("tp_btf/softirq_entry")
int BPF_PROG(softirq_entry, unsigned int vec_nr)
{
    u64 delta_us, *raise_ts_ptr, *existing_entry_ts_ptr, *serviced_total_ptr, ts;
    struct softirq_latency_key_t key = {};

    check_net_rx(vec_nr);

    ts = bpf_ktime_get_ns();

    read_array_ptr(&softirq_net_rx_serviced_total, &net_rx_idx, serviced_total_ptr);
    *serviced_total_ptr += 1;

    read_array_ptr(&softirq_net_rx_raise_timestamp, &net_rx_idx, raise_ts_ptr);

    // Interrupt was re-rased after ts was obtained, resulting in negative duration
    if (*raise_ts_ptr > ts) {
        return 0;
    }

    // Interrupt entry started with no corresponding raise, resulting in large duration
    if (!*raise_ts_ptr) {
        return 0;
    }

    delta_us = (ts - *raise_ts_ptr) / 1000;

    increment_exp2_histogram(&softirq_net_rx_entry_latency_seconds, key, delta_us, MAX_LATENCY_SLOT);

    // Allow raise timestamp to be set again
    *raise_ts_ptr = 0;

    read_array_ptr(&softirq_net_rx_entry_timestamp, &net_rx_idx, existing_entry_ts_ptr);

    // There is some time from function start to here, so overall service time includes it
    *existing_entry_ts_ptr = ts;

    return 0;
}

SEC("tp_btf/softirq_exit")
int BPF_PROG(softirq_exit, unsigned int vec_nr)
{
    u64 delta_us, *entry_ts_ptr, ts;
    struct softirq_latency_key_t key = {};

    check_net_rx(vec_nr);

    ts = bpf_ktime_get_ns();

    read_array_ptr(&softirq_net_rx_entry_timestamp, &net_rx_idx, entry_ts_ptr);

    // Interrupt exited with no corresponding entry, resulting in large duration
    if (!*entry_ts_ptr) {
        return 0;
    }

    delta_us = (ts - *entry_ts_ptr) / 1000;

    increment_exp2_histogram(&softirq_net_rx_service_latency_seconds, key, delta_us, MAX_LATENCY_SLOT);

    // Reset entry ts to prevent skipped entries to be counted at exit
    *entry_ts_ptr = 0;

    return 0;
}

char LICENSE[] SEC("license") = "GPL";

Maybe I should add the array version as an example for completeness as well.

I'm a bit confused why exit is so much more expensive than entry, both are doing roughly the same work, and entry seems to be doing more, but for less CPU.

Here's a flamegraph for the array based version (-e cpu-clock):

image

With -e cyclces:

image

Zoomed into check_preemption_disabled:

image

With -e cycles and softirq stack coalescing disabled:

image

It looks like exit is getting interrupted (even in ktime_get_mono_fast_ns!), while entry is not.

bobrik added a commit to bobrik/ebpf_exporter that referenced this pull request Oct 20, 2023
netoptimizer added a commit to xdp-project/xdp-project that referenced this pull request Dec 11, 2023
The script softirq_net_latency_safe.bt attempts to be safe
to run on production systems with a lot of CPUs.

The scripting language bpftrace is a little too eager to
create BPF hash-maps that comes with overhead.

Also remove measuring the softirq runtime, but instead
focus on the latency from raise-to-run. As it was found
that softirq_exit tracepoint comes with strange overhead[1].

[1] cloudflare/ebpf_exporter#300

Signed-off-by: Jesper Dangaard Brouer <jesper@cloudflare.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants