-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"eth0: hw csum failure" logging #2723
Conversation
…nds"" This reverts commit 2a2fbe1.
Simple logging in the driver should the checksum offload value not match a software check. This will have a small performance impact, but is necessary to try and identify why we are getting "eth0: hw csum failure" messages. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Simple logging in the driver should the checksum offload value not match a software check. This will have a small performance impact, but is necessary to try and identify why we are getting "eth0: hw csum failure" messages. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
@MilhouseVH : @popcornmix has said that you have been seeing these "hw csum failure" messages and are in a position to create kernel images fairly easily. There's a slight reluctance to merge this to the main tree as it will have a small performance hit on every single received packet, but if we can't gain any data from other sources then we may have to. We still can't reproduce :-( |
@6by9 yep, I'll include them in my 4.18.y test builds from tonight, and also perform my own testing - I've found that with 4.19-rc8 (my LibreELEC test build #1016x, without any reverts) the following is sufficient to trigger a kernel backtrace LibreELEC:
which then produces the following backtraces (all of them - log is clear until the above command is executed): A 4.19-rc8 kernel with the two reverts 5ca4ac2 and f2c248f does not produce a backtrace. |
ix.io is also a trigger for last night's 4.18.14 without any reverts. |
This is an ix.io-triggered backtrace when using the latest rpi-4.18.y branch and this PR: I'm not seeing any additional logging (which may suggest the problem is elsewhere?) |
Thanks. Your callstacks do include a load of nf (netfilter) functions. Have you got any odd iptables entries?
|
That is great news - many thanks. |
Yes in LibreELEC we have a few additional rules:
|
Hmm, that may be the clue. |
Sorry, messing with branches having switched to testing 4.19. https://github.com/6by9/linux/tree/rpi-4.19.y-net top commit should log the content of the packet that caused the hw csum failure. |
No, I can't get 4.19-rc8 to throw the error either, even with all your iptables rules loaded. I hope you have more success. |
Current rpi-4.19.y (c2f4561) plus rpi-4.19.y...6by9:rpi-4.19.y-net, triggered with |
ix.io will also to trigger a hw csum failure on RPi1 with LibreELEC 4.19-rc8. This is the kernel log from an RPi1 with rpi-4.19.y plus rpi-4.19.y-net, same "hello" ix.io trigger: My network configuration is nothing fancy, aside probably from the LibreELEC iptables rules. It's RPi1/RPi3+ using genuine RPF white PSUs, both wired with CAT5
|
A big thank you for that. |
Yes, these are my LAN IP addresses: DGND4000: 192.168.0.1 There's another RPi3+ (192.168.0.4, wired into the DGND4000) running Raspbian (Stretch, 4.14.49-v7+ #1120) and dnsmasq (DNS + DHCP) for the entire LAN, just in case that is relevant. And yes, 66.172.11.73 appears to be ix.io:
|
The traffic is a sequence of single byte TCP packets from the web server to the client. The bytes make up part of a URL:
I think it's odd (because it's so inefficient) to be pushing single bytes at this point unless the receive window is closed. |
I get a similar pattern of packet sizes running curl on my Mac. The first packet contains 157 bytes of payload - the full HTTP header, two blank lines and an 'h'. The second is just a 't', as is the third. The fourth is 'p', etc. The total length of the first packet is 223 bytes - nowhere near the MTU. |
4.14.62 #0821 doesn't have the hw csum failure when testing with ix.io 4.18.4 #0822 does have the hw csum failure when testing with ix.io (http://ix.io/1pqn) So ix.io wasn't/isn't a problem with 4.14.y (I stopped building 4.14.y after 4.14.62). |
I'm convinced the problem will be at our end, but it may be triggered by a peculiarity in the source. Perhaps the small packets are because the content is being generated a byte at a time by a server script, and the first gets the header prepended. |
By the way, even though I'm getting these hw csum failures with ix.io, I AM able to upload stuff successfully and there's no obvious problem at the command line. The uploaded content also appears to be uncorrupted, so the kernel messages are "noise", at least in this case. |
Yup, from https://github.com/raspberrypi/linux/blob/rpi-4.14.y/net/core/datagram.c#L761 I don't see it aborting/discarding the packet after logging the error, although I suspect the caller will be looking at the result. |
Thanks Phil for doing the analysis - I had to go out for the evening.
It is only showing up after the change "pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends". Plan for the morning will be to throw a WARN into pskb_trim_rcsum_slow to see where it gets called from, as that would be the point that the CHECKSUM_COMPLETE was previously getting cleared. I'll also grab Milhouse's nightly build and see if that allows us to reproduce it on the corporate network (it may be networking infrastructure related). |
Testing here in the office with my normal setup (isolated network nat'ed to the corporate network via Ubuntu on VirtualBox) I get a 220 byte packet (166 of TCP data) with the full HTTP header and I suspect VirtualBox's routing may be being smarter than it should :-( |
72534f6 pushed to my rpi-4.19.y-net branch to dump the packet and parameters on any call to I'm seeing it called on some multicast packets, but not when trying @MilhouseVH Any chance you can try it? There's obviously some subtle thing I'm missing at the moment. I may head home early and try it on my home broadband to see if the change in networking setup helps. |
Looking in more detail at the trimmed multicasts, trimming by 6 bytes is happening on IGMPv3 membership reports/join messages from another Pi. They have been sent as 60 bytes on the wire, but only 40bytes of IP + 14 bytes of ethernet header. The remaining 6 bytes are observed to be 0. The other multicast messages being trimmed by 1 are typically SSDP messages. Both Windows and the Pi agree that they are 214 bytes on the wire: 14 bytes of ethernet header, and 199 bytes of IP, with one byte left over (Wireshark incorrectly tags it as VSS-monitoring ethernet trailer). I'm still much confused over this :-/ |
ix.io test performed at |
Perfect. Thank you. I'm suspecting there's a bug in csum_sub. Disecting the buffer first to confirm that the hw offload csum is valid for the full buffer length. |
Head and brick walls. Endianness screws things over too.
I make the checksum of the complete packet 0x4b70. Endian-swap and I'll match the 704b that was logged. Remove the last 5 bytes and I compute the checksum of that packet to be 0xf128. It works if you do longhand, or compute the checksum of 0xa8006e03 folds down into 0x1604, so I don't see that computation being correct when I compute it to be 0xf128 (or is it 0x28f1). I'll try to parcel this packet up into a form that I can replay at a Pi. This looks to be all about underflows/overflows/carrys of the checksum maths, so the network conditions (IP addresses, port numbers, anything) will be affecting it. |
EUREKA!!! |
To save those needing to recreate it, packet capture of 1 packet attached. |
Not endian-swapped in the logging - 0x704b is incorrectly byte swapped from the driver for some reason. |
Why would any hardware designer in their right mind make the content of a register host order dependent? In fact, how could they? |
Scratch that - they are packet trailers, so in memory. |
is used on every other packet fine, just not this one. skb->csum is on the host system and needs to be handled there correctly as a 16 bit value, hence needing ntohs. I'm adding logging in the driver to print skb->csum for every packet now, but unfortunately touched checksum.h so it's a bigger rebuild than otherwise. Further note that I'm on a 3B+ in this case. |
Life is very confusing as to which way around is correct to deal with this data! The checksum computed by the driver as 704b is the correct value that the kernel is wanting. It is therefore wanting to subtract 58000247 / 5a47 to give 28f1. I think I have concluded that it is
and I appear to get no csum failures.
|
OK, daft thought. Seeing as the suspicion is that this is generic, do we have a > 4.14.71, or 4.18 / 4.19 x86/x86_64 kernel build that we can throw this packet at (after suitably modifying the MAC address)? I know I don't have one immediately to hand, but do the Milhouse x86_64 builds fit the bill? |
My current LibreELEC x86_64 build is 4.18.14: http://forum.kodi.tv/showthread.php?tid=298462&pid=2783648#pid2783648 There is a LibreELEC 4.19-rc8 x86_64 build here: https://forum.kodi.tv/showthread.php?tid=298462&pid=2783419#pid2783419 I'm not sure how to "replay" this data, but if you can write LibreELEC 8.90.006 to a USB memory stick (4GB+) then you boot into I've tested ix.io on a Skylake NUC (4.18.14/4.19-rc8, ip 192.168.0.20, Intel WiFi), but no panic. Same with a Revo 3700 (4.18.14/4.19-rc8, ip 192.168.0.12, wired r8169 ethernet) - no panic. |
Thanks Milhouse. I've now got to think what spare x86 kit I have available. Some old Acer Revo's 3610's (Atom and NVidia ION) are probably about all the sacrificial stuff. There is also the fact that there are network driver options to force alignment of packets within the IP stack, so that may be masking stuff on other platforms. If you fancied trying it, then the easiest approach I've found is to add an ethernet interface with a back-to-back connection to the device being tested so switches don't filter out messages on you. There's no DHCP etc, so the source device doesn't start sending any extraneous junk. Add the relevant static IP address to the DUT. Please feel under no obligation to try this. Now we can reproduce it I should be able to make progress. Having said that, I'm out of the office tomorrow, so it may have to wait for next week. |
Yeah sorry, I'm not likely to be able to run those tests on my x86_64 kit but I can of course test any commits on the RPi1/RPi3+ as and when you think you may have something worth testing. Part of the problem with me cobbling something together on my x86_64 kit is that the lack of a panic is likely to be due to my own ineptitude as different network hardware etc. LibreELEC 8 and 9 should work with your Revo 3610 - it's an ION1 GPU which should be supported. Make sure Booting LibreELEC from the USB memory stick in Just don't run the default |
No problem, it was just one of those daft thoughts as I was about to leave the office. I'm very appreciative of the efforts you've put in to help us on this one. I've got a couple of ideas over things to check on the Pi kernel, and I'll take my Revo into the office on Monday to have a play. |
I think I have reproduced how the incrorrect checksum in the example above was calculated by the kernel code:
So, what has happened? The root cause of our problems is that the IP checksum is computed over 16 bit data, and we ended up calculating the first part (4B70) starting from an even address, but the second part (475A) starting from an odd address. When looking at the kernel code, I saw a comment that warned that incremental checksum calculations are only allowed if all fragments, except the last, have even length. Now I understand why -- in the example above, the first fragment has odd length, causing the second fragment to start at an odd address, messing up high and low bytes in the checksum calculation. Edit: @6by9 already suspected something similar above ("skb_checksum should be doing this itself as it tries to compensate for the starting offset"). However, I am not sure whether it is actually skb_checksum's job to "compensate for the starting offset" in the sense we need it in this case here: skb_checksum seems to assume that the range of bytes it is checksumming starts at an even offset with respect to the network packet, but not necessarily with respect to memory addresses, and provides special treatment for the latter. The correction of skb_checksum's result for the fragment to be removed in @6by9 's update to pskb_trim_rcsum_slow() takes care of the former, and seems to be required to get both aspects covered correctly, in addition to the current code in the kernel. Why is this not hitting a larger number of people? I can only assume that in many cases the fragment to be removed consists of zeros only (as in the dumps provided by @MilhouseVH |
@catschulze I agree with your analysis. What skb_checksum gets up to seems to be the crux of it. The comments do say that all fragments except the last have to be even length, but the code appears at first glance to dispute that. The C reference version does check the start offset for odd/even, as does what I believe to be the ARM optimised version. The check does seem to be pretty noddy of checking the address of the buffer start for odd/even, when I'm not clear on the requirement for the buffer to be aligned. There are patches for 9514 which do fix the alignment. I was going to mock up an SKB with buffer Anyway, I'll be emailing net-dev tomorrow with what we've found. |
@6by9 Looking at the prototype of do_csum()
you can see that this function just receives the buffer location and length, but not the relative offset from the start of the packet. When working with (non-final) fragments of odd length, you would need both to correctly compute the checksum update:
Note that len is not memory address related (as the check involving buff in the first code fragment above was), but instead contains the required information of the offset of the fragment's start from the start of the entire packet. (BTW, alignment of the packet or fragments is irrelevant to our discussion here; do_csum() will deal correctly with arbitrary alignment of the fragment's data buffer, and your code does not have to care at all, because it works with logical packet related offsets only. The only influence alignment has is efficiency related, because 32 bit alignment of fragments makes it easier for do_csum() to run with native 32 bit operations.) IMO you should ask on net-dev to get a version of your patch (maybe excluding the debugging output) merged. Also provide a capture of the problematic packet, with non-zero padding and odd lengths of both initial fragment and padding -- if I am not mistaken, this effect should also be observable on other architectures as well (x86_64). |
I can confirm that the |
@MilhouseVH Dimitris' patch from net-dev looks like an equivalent, but cleaner solution to what @6by9 had found above. It changes the call to csum_sub() into a call to csum_block_sub(), which gets len as additional argument and is therefore able to detect that an additional byte swap of the value returned from skb_checksum() is required in certain cases. @6by9 It seems that this is now on its way to the stable kernel series, see LKML: https://lkml.org/lkml/2018/10/21/80. That means we might just have to wait for 4.14.79 to be released to get the full fix (until then, the revert 2a2fbe1 in our 4.14 tree should already avoid these error messages, with somewhat reduced efficiency due to discarding the hardware checksum). (As indicated by @MilhouseVH, 2a2fbe1 (and e5c9741?) should be reverted once we have Dimitris' patch.) |
@MilhouseVH Huge thanks for all your testing. That commit from Dimitris is spot on with where we were heading (it helps to know the calls that are available!) @catschulze I'd been looking at the do_csum code and seeing the use of I'm going to close this PR and update the other issues relating to this. Thanks again for the assistance. |
@MilhouseVH Huge thanks for all your testing. That commit from Dimitris is spot on with where we were heading (it helps to know the calls that are available!) @catschulze I'd been looking at the do_csum code and seeing the use of I'm going to close this PR and update the other issues relating to this. Thanks again for the assistance. |
Thanks @6by9 and @catschulze for all your time digging into this! |
For dicussion. I'll see if the person on #2713 who has been reproducing this reliably can get some more logging for us before suggesting we merge this generally.
This will have a small performance hit as it s/w checksums every packet.