Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add packet alert mode and monitoring of hardware originated drops #11

Merged
merged 12 commits into from
Aug 24, 2019

Conversation

idosch
Copy link
Collaborator

@idosch idosch commented Aug 21, 2019

Hi,

The drop monitor kernel module was recently extended with two new features which are expected to be merged to the 5.4 kernel. This pull request adds support for these new features in DropWatch.

The first feature was merged in this kernel merge commit. It extends drop monitor with another alert mode in which dropped packets are notified to user space, instead of just a summary of recent drops. This enables users to perform a more detailed analysis of the dropped packets. Please refer to the kernel merge commit for example usage as well as to the individual commit messages in this pull request. The man page and help command were extended as well.

The second feature was merged in this kernel merge commit. It allows drop monitor to monitor hardware originated drops, in addition to software originated drops.

Patches #1-#2 perform simple clean-ups.

Patch #3 synchronizes the kernel header file.

Patches #4-#8 add support for packet alert mode.

Patches #9-#12 allow DropWatch to monitor hardware originated drops.

I have tested this patched DropWatch version on a net-next kernel and a net kernel. Also tested unmodified DropWatch on a net-next kernel.

idosch added 2 commits August 2, 2019 17:59
Fix a few spelling mistakes in the README file.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Remove whitespace errors throughout the codebase. These are evident with
vim plugins such as vim-better-whitespace. No functional changes
intended.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
@idosch
Copy link
Collaborator Author

idosch commented Aug 21, 2019

@dsahern David, I know you have some changes you wanted to make on top of this, so I'm copying you like I promised.

@dsahern
Copy link

dsahern commented Aug 22, 2019

Thanks for letting me know. My changes are a bit of hack to try out the new features. More thought is needed to better integrate packet analysis - eg., use libpcap to dump packets to a file. I doubt I will have time for something merge worthy in the next few months, so if someone else has the time and motivation please give it a try.

@idosch
Copy link
Collaborator Author

idosch commented Aug 22, 2019

Thanks for letting me know. My changes are a bit of hack to try out the new features. More thought is needed to better integrate packet analysis - eg., use libpcap to dump packets to a file. I doubt I will have time for something merge worthy in the next few months, so if someone else has the time and motivation please give it a try.

Hi David,

I thought about it and I'm not sure we need to add pcap support to DropWatch. Instead, I started writing a libpcap capture module that will allow tcpdump/wireshark to specifically capture drop monitor packets. It will basically open the netlink socket, configure drop monitor to switch to packet alert mode and start monitoring. The pcap filter will be run in user space on the NET_DM_ATTR_PAYLOAD data.

Copy link
Owner

@nhorman nhorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that dropwatch shouldn't need to have libpcap support, since we can just use tcpdump/wireshark in conjunction with dropwatch to achieve the same goal, and that seems like the more efficient approach.

As for this patch, it looks good to me. I made a few requests for comments on the NLA_PUT_* macros to remind readers that they have implied gotos nested in them. If you can add those, I'll gladly merge this

src/main.c Outdated Show resolved Hide resolved
src/main.c Outdated Show resolved Hide resolved
src/main.c Outdated Show resolved Hide resolved
src/main.c Outdated Show resolved Hide resolved
Copy link
Owner

@nhorman nhorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that dropwatch shouldn't need to have libpcap support, since we can just use tcpdump/wireshark in conjunction with dropwatch to achieve the same goal, and that seems like the more efficient approach.

As for this patch, it looks good to me. I made a few requests for comments on the NLA_PUT_* macros to remind readers that they have implied gotos nested in them. If you can add those, I'll gladly merge this

@nhorman
Copy link
Owner

nhorman commented Aug 23, 2019

comment requests?

@idosch
Copy link
Collaborator Author

idosch commented Aug 23, 2019

comment requests?

Thanks for the review, Neil. I commented above. Hope it is fine.

idosch added 10 commits August 24, 2019 10:57
The drop monitor kernel module was recently extended with new features
such as packet alert mode and truncation.

Synchronize the kernel headers in order to make use of these new
abilities in dropwatch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Extend dropwatch to support the recently added packet alert mode. In
this mode, the dropped packet and various metdata are notified to user
space as netlink events.

Example:

dropwatch> set alertmode packet
Setting alert mode
Alert mode successfully set
dropwatch>
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: __netif_receive_skb_core+0x207b/0x2850 (0xffffffff8cc7302b)
input port ifindex: 10
timestamp: Sat Aug 10 19:28:57 2019 328865171 nsec
protocol: 0x800
length: 142

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Previous patch added support for packet alert mode in which the dropped
packet itself is sent to user space. Usually, only the packet's headers
are of interest.

Add support for packet truncation. User can instruct the kernel to
truncate dropped packets to a specified length before they are enqueued
to the netlink socket's receive buffer.

Example:

dropwatch> set alertmode packet
Setting alert mode
Alert mode successfully set
dropwatch>
dropwatch> set trunc 64
Setting truncation length to 64
Truncation length successfully set
dropwatch>
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: __netif_receive_skb_core+0x207b/0x2850 (0xffffffffaf46f92b)
input port ifindex: 11
timestamp: Fri Aug  2 18:35:52 2019 028620846 nsec
length: 64
original length: 142

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
When "alertmode" is set to "packet", the kernel queues dropped packets
in a per-CPU drop list before preparing a netlink message for each. This
setting controls the queue's length. By default, this is limited by the
kernel to 1,000 packets.

Example:

dropwatch> set queue 100
Setting queue length to 100
Queue length successfully set

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Allow user to query drop monitor's current configuration from the
kernel.

Example:

dropwatch> show
Getting existing configuration
Alert mode: Packet
Truncation length: 64
Queue length: 100
dropwatch>
dropwatch> set alertmode summary
Setting alert mode
Alert mode successfully set
dropwatch>
dropwatch> show
Getting existing configuration
Alert mode: Summary
Truncation length: 64
Queue length: 100
dropwatch>
dropwatch> set trunc 0
Setting truncation length to 0
Truncation length successfully set
dropwatch>
dropwatch> show
Getting existing configuration
Alert mode: Summary
Truncation length: 0
Queue length: 100

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Allow the user to query and show drop monitor statistics from the
kernel. Specifically, the "Tail dropped" counter shows how many packet
could not be enqueued to the per-CPU drop list(s).

Example:

dropwatch> stats
Getting statistics
Software statistics:
Tail dropped: 4810

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Prepare drop monitor to display hardware drops in packet alert mode by
printing the hardware-specific attributes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Prepare drop monitor to display hardware drops in summary alert mode by
printing the drop reason along with the provided count.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Similar to software statistics, display the tail drop counter for
hardware originated drops.

Example:

dropwatch> stats
Getting statistics
Software statistics:
Tail dropped: 4810
Hardware statistics:
Tail dropped: 0

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Now that drop monitor can also display hardware originated drops in
either mode (summary and packet), allow the user to enable their
monitoring.

This is accomplished by the setting of flags (i.e., "sw" and "hw")
before starting the monitoring.

In order to maintain backward compatibility, if no flag is specified
only software originated drops are monitored. Both types of drops can be
monitored at the same time.

Example:

dropwatch> set sw true
setting software drops monitoring to 1
dropwatch>
dropwatch> set hw true
setting hardware drops monitoring to 1
dropwatch>
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
10 drops at fid_miss [hardware]
10 drops at ttl_value_is_too_small [hardware]
2 drops at __netif_receive_skb_core+207b (0xfffffffface6f92b) [software]
11 drops at fid_miss [hardware]
11 drops at ttl_value_is_too_small [hardware]
^CGot a stop message
dropwatch>
dropwatch> set alertmode packet
Setting alert mode
Alert mode successfully set
dropwatch>
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: ttl_value_is_too_small (l3_drops)
origin: hardware
input port ifindex: 14
input port name: eth0
timestamp: Sun Aug  4 11:14:55 2019 928228704 nsec
protocol: 0x800
length: 142
original length: 142

drop at: __netif_receive_skb_core+0x207b/0x2850 (0xfffffffface6f92b)
origin: software
input port ifindex: 12
timestamp: Sun Aug  4 11:14:55 2019 994202094 nsec
protocol: 0x800
length: 142
original length: 142

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
@nhorman nhorman merged commit 5f0b2d6 into nhorman:master Aug 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants