-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add softirq-latency example #300
Conversation
98d9812
to
9aaefcd
Compare
read_array_ptr(&softirq_raise_timestamp, &vec_nr, raise_ts_ptr); | ||
|
||
// Interrupt was re-rased after ts was obtained, resulting in negative duration | ||
if (*raise_ts_ptr > ts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens mostly for TASKLET_SOFTIRQ
and sometimes for TASKLET_SOFTIRQ
:
<...>-27627 [043] d.s3. 46693.775555: bpf_trace_printk: re-raise after 1500ns [6]
<idle>-0 [035] d.s4. 46693.884661: bpf_trace_printk: re-raise after 2040ns [6]
<idle>-0 [061] d.s4. 46694.595380: bpf_trace_printk: re-raise after 1650ns [6]
<idle>-0 [047] d.s4. 46694.922268: bpf_trace_printk: re-raise after 1650ns [6]
<idle>-0 [001] d.s4. 46694.942947: bpf_trace_printk: re-raise after 1120ns [6]
<...>-1445425 [041] d.s4. 46694.951062: bpf_trace_printk: re-raise after 1450ns [6]
<idle>-0 [071] d.s4. 46695.262426: bpf_trace_printk: re-raise after 840ns [7]
conduit-watcher-2620846 [039] d.s3. 46695.998234: bpf_trace_printk: re-raise after 1430ns [6]
<idle>-0 [003] d.s4. 46696.024913: bpf_trace_printk: re-raise after 2090ns [6]
<idle>-0 [029] d.s4. 46696.244701: bpf_trace_printk: re-raise after 1150ns [6]
The working theory is that there can be a raise after entry started but before this condition is checked.
There aren't too many of these, so we mostly ignore them.
Example for NET_RX: $ curl -s http://metrics-interface:3141/metrics | fgrep NET_RX_SOFTIRQ ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1e-06"} 7.285406e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2e-06"} 7.795892e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="4e-06"} 8.007842e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="8e-06"} 8.151191e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.6e-05"} 8.230504e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="3.2e-05"} 8.251142e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="6.4e-05"} 8.259189e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000128"} 8.261051e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000256"} 8.261677e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000512"} 8.262088e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.001024"} 8.262267e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.002048"} 8.262433e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.004096"} 8.262572e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.008192"} 8.262683e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.016384"} 8.262707e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.032768"} 8.262708e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.065536"} 8.262708e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.131072"} 8.262708e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.262144"} 8.262708e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.524288"} 8.262708e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.048576"} 8.262708e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2.097152"} 8.262708e+06 ebpf_exporter_softirq_entry_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="+Inf"} 8.262708e+06 ebpf_exporter_softirq_entry_latency_seconds_sum{kind="NET_RX_SOFTIRQ"} 16.282414 ebpf_exporter_softirq_entry_latency_seconds_count{kind="NET_RX_SOFTIRQ"} 8.262708e+06 ebpf_exporter_softirq_raised_total{kind="NET_RX_SOFTIRQ"} 8.275197e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1e-06"} 226218 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2e-06"} 1.643431e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="4e-06"} 4.141819e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="8e-06"} 5.646346e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.6e-05"} 6.952278e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="3.2e-05"} 8.055987e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="6.4e-05"} 8.239492e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000128"} 8.262601e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000256"} 8.266388e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.000512"} 8.267436e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.001024"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.002048"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.004096"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.008192"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.016384"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.032768"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.065536"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.131072"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.262144"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="0.524288"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="1.048576"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="2.097152"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_bucket{kind="NET_RX_SOFTIRQ",le="+Inf"} 8.267449e+06 ebpf_exporter_softirq_service_latency_seconds_sum{kind="NET_RX_SOFTIRQ"} 130.717583 ebpf_exporter_softirq_service_latency_seconds_count{kind="NET_RX_SOFTIRQ"} 8.267449e+06 ebpf_exporter_softirq_serviced_total{kind="NET_RX_SOFTIRQ"} 8.26212e+06
9aaefcd
to
74a19bf
Compare
IMHO we need to limit what softirq's that are getting traced. In my bpftrace script I'm limiting this to vector 3 which is NET_RX_SOFTIRQ P.s. |
The script softirq_net_latency_safe.bt attempts to be safe to run on production systems with a lot of CPUs. The scripting language bpftrace is a little too eager to create BPF hash-maps that comes with overhead. Also remove measuring the softirq runtime, but instead focus on the latency from raise-to-run. As it was found that softirq_exit tracepoint comes with strange overhead[1]. [1] cloudflare/ebpf_exporter#300 Signed-off-by: Jesper Dangaard Brouer <jesper@cloudflare.com>
Example for NET_RX:
cc @netoptimizer