pdrop
is a Python script that uses LTTng to analyze packet drops occurring in the Linux kernel network stack.
Some Linux kernels provide a tracepoint named kfree_skb
located in the function with the same name. This tracepoint is called when a socket buffer (skb) is deallocated. By tracking the socket buffers deallocation, we are able to infer when packet drops occur in the Linux kernel network stack. This method of tracking packet drops is currently limited. See the limitations section for more information.
pdrop
tracks the kfree_skb
events and provides useful information such as when and where a packet drop occurred.
- Python 3
- LTTng toolchain >= 2.2
- Babeltrace git master built with Python bindings support (refer to Babeltrace README to see how bindings can be enabled)
- iproute2 (Provides the tc command)
- Network emulator module (sch_netem) (Required to emulate packet loss with the tc command)
$ pdrop.py /path/to/trace
$ pdrop.py trace/
[2014-01-27 17:52:34.544754084] 0xffffffff81414cd0 tcp_v4_do_rcv+112
[2014-01-27 17:52:34.544781010] 0xffffffff814171b8 tcp_v4_rcv+424
[2014-01-27 17:52:54.544989518] 0xffffffff81414cd0 tcp_v4_do_rcv+112
[2014-01-27 17:52:54.545000818] 0xffffffff814171b8 tcp_v4_rcv+424
[2014-01-27 17:53:09.867714636] 0xffffffff81414cd0 tcp_v4_do_rcv+112
[2014-01-27 17:53:09.867831349] 0xffffffff814171b8 tcp_v4_rcv+424
[2014-01-27 17:53:10.542946516] 0xffffffff81414cd0 tcp_v4_do_rcv+112
[2014-01-27 17:53:10.542955301] 0xffffffff814171b8 tcp_v4_rcv+424
[2014-01-27 17:53:14.552876436] 0xffffffff81414cd0 tcp_v4_do_rcv+112
[2014-01-27 17:53:14.552884595] 0xffffffff814171b8 tcp_v4_rcv+424
- The first column indicates the timestamp when the packet drop event occurred.
- The second column indicates the address where the packet was dropped.
- The third column, if present, indicates the function+offset where the packet was dropped.
Note that the functions symbols are resolved using /proc/kallsym
on the host where pdrop is runned.
The trace.sh
script can be used to simulate packet drops. This script setups a tracing session using the lttng command-line tools (provided with the lttng-tools package) and enables the skb_kfree
kernel event. It then proceeds with the simulation of packet drops with the help of the tc
and the sch_netem
network emulator module. Note that you must modprobe sch_netem
before running this script.
When the script is done, you should have trace data available in the folder indicated. You can then proceed with the
pdrop
usage instructions above.
The methodology used by pdrop
to detect packet drops in the Linux kernel network stack has some limitations. It is often the case that a reported packet drop is a perfectly normal situation of deallocations on networking cleanup/teardown code paths. Thus this tool might report false-positive situations. The current way of detecting packet drop rely on the "side-effect" of deallocation of a socket buffer in the kfree_skb()
function. The commit "ead2ceb0ec9f85cff19c43b5cdb2f8a054484431" in the Linux kernel changed the semantic of the kfree_skb()
and introduced a replacement function consume_skb()
that should be used in non-packet drop deallocation situation. Normal socket buffer teardown/cleanup paths should use consume_skb()
. This is often not the case. Thus we often see false-positive. All the tools (SystemTAP dropwatch.stp, Perf net_dropmonitor and dropwatch) using the kfree_skb
tracepoint mechanism all suffer from the same limitation.
Moreover, some code paths where network packets are dropped (e.g.: failure to allocate a skb) might not even get reported because they are not calling kfree_skb()
.