-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network driver on RPi3 B Plus causing hung tasks when working on an NFS mount #2482
Comments
An easy way to trigger this bug (if you don't want to try compiling the kernel package) is to simply use
Now, if I swap out the micro SD and boot into a RPi 2 I have lying around, same network cable, same power supply, and repeat the commands, everything works as expected. I think that helps to rule out the NFS server, network hardware etc. as potentially to blame.
|
Does disabling Energy Efficient Ethernet make a difference? Add But before trying that you can confirm whether EEE is active using |
Great suggestion, @pelwell. I got some very encouraging results using the Before:
After:
The test with dd:
Unfortunately, when compiling which as you can reconize, writes out data must less frequently than
|
I combined a few replies into one (above) and tried to make it it bit more concise. TL;DR version is that disabling EEE does not help. |
If possible, and if it isn't already on, can you enable flow control on the switch port connected to the Pi? |
@pelwell - All the wired connections go through an unmanaged switch. No settings to tweak :/ |
I had similar issues on SAMBA mount. But currently I can not run tests again, because I sent back my Pi3B+. IMO current revision of Pi3B+ has serious hardware issues and I don't believe that they can be solved via software (Finally, I never was able to play a video longer than 15mins without a Kodi crash, kernel Oops, or freeze) @pelwell and co: I still can't believe that you guys never had such issues before |
Current production I believe. Although I don't think there have been
many/any changes since the prototypes.
It appears that the issues are erratic, and depend on the capabilities of
the network the device is attached to. We are trying to figure out the
exact circumstances. I suspect there are a number of issues being seen, as
often happens when a previously working driver suddenly because used by
250k extra people over a weekend, in all sorts of new and unpredictable
ways. I have high hopes there is a software solution to this, we've always
been able to find them in the past.
…On 31 March 2018 at 19:29, Manfred Kreisl ***@***.***> wrote:
Bug: Frequent kernel oops due to blocked tasks when writing files to NFS
mount.
I had similar issues on SAMBA mount. But currently I can not run tests
again, because I sent back my Pi3B+.
IMO current revision of Pi3B+ has serious hardware issues and I don't
believe that they can be solved via software (Finally, I never was able to
play a video longer than 15mins without a Kodi crash, kernel Oops, or
freeze)
@pelwell <https://github.com/pelwell> and co:
Which Pi3B+ (revision) are you currently using? Parts of 0-series or parts
from current production line, which customers are using now.
I still can't believe that you guys never had such issues before
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2482 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADqrHV52f7xwHg74CUm3wYPAGFDuyvYZks5tj8sFgaJpZM4TB_lw>
.
--
James Hughes
Principal Software Engineer,
Raspberry Pi (Trading) Ltd
|
Perso I back on pi 2b ... @mkreisl when your pi is back if work normally I send my pi too |
+1 I am noticing this issue as well when reading off samba mount. Brand new RPI 3B+. |
@pelwell - From @popcornmix's advice in #2442, I built:
I automated that dd test I described above in a simple script that repeats the writing out of 1G worth of zero filled file over an NFS share 32 times. I then used histogram.py to compute the stats. With the
When I removed that line (reverting to the default state of it being on, 1 of the 32 runs was really long:
Since using dd is going to max out the bus, I will try compiling the kernel which is much more gentle to the network IO and much more prone to errors in my experience. Thoughts? |
OK... still experiencing the timeouts when compiling to the NFS share with eee enabled despite the successful replicates of using dd above. I am currently building c2eb306 and will test it by compiling the kernel to NFS with eee enabled and with it disabled. For reference, here is the script to automate the replicate compile jobs. |
@pelwell - I am still getting network timeouts... below is with
|
Same here, nothing changed. Still absolutely unstable, unreliable and completely unusable, the Pi3B+ |
Odd, got one on my desk that is working fine. I think you forget to add "In the circumstances I am using it". Anyway, issues still being looked at both here and at Microchip. There was a patch on the linux netdev list today for this chips driver (lan78xxx) for EEE which may well help, that will need to be tried. It's not like we are just sitting here twiddling our thumbs. |
Seems you're getting fire under your a.. now 😄 IMO you're looking at the wrong place. LAN issues are only the top of the iceberg I was already reporting, that system is still unstable after that dump microchip is powered off and all traffic is going over wlan device. System still freezing randomly. So, before I'm better informed, I would say the hole Pi3B+ design is a huge issue |
Some users who reported problems (and there honestly haven't been that many, but they are shouting loudly) have had success with adding |
What's the default for Pi3B+. Cant find it here |
500 turbo, 400 normal |
@pelwell - I have some hard data now. I ran the make benchmark writing out to the NFS share under 2 conditions, once with eee disabled and once with it enabled. There is a clear trend: eee is causing problems. Running
|
@mkreisl - Please keep this issue on task... it's scoped for network writes not for general stability. Open a new task for that. |
@graysky2 Oops, sorry for tainting your thread |
A potential work-around: don't totally disable EEE, but set Again, values reported are compile times in minutes.
|
All those EEE settings doesn't help for me, because my router/switch does not support EEE (most router with integrated switch does not support it) and I'm still getting nfs timeouts even if EEE is completely disabled, or I'm getting
if using SAMBA mount instead of NFS mount and after some time process that writes to share stucks and becomes uninterruptable 'D' state forever |
@mkreisl - Are you booted into the same kernel and are you using the same firmware commit that I am? |
@graysky2 |
@mkreisl - not sure what to say then.... perhaps you have a different issue. As a control, have you tried the same stuff with another older RPi? Like a 2 or 3? |
@graysky2 Sure, I'm running same procedure on Pi1, 2 and 3 (without +) since years without any problem. |
@graysky2 In short words, I can explain what it does
From 1 to 4 it always works, and within 5 it stucks always, but not on the same subvolume |
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
commit 6fa07a5a10e7724c34d51b12187a138e85775433 from https://github.com/raspberrypi/linux.git rpi-6.6.y TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. raspberrypi/linux#2449 raspberrypi/linux#2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org> Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
commit 6fa07a5a10e7724c34d51b12187a138e85775433 from https://github.com/raspberrypi/linux.git rpi-6.6.y TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. raspberrypi/linux#2449 raspberrypi/linux#2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org> Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
commit 6fa07a5a10e7724c34d51b12187a138e85775433 from https://github.com/raspberrypi/linux.git rpi-6.6.y TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. raspberrypi/linux#2449 raspberrypi/linux#2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org> Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
commit 6fa07a5a10e7724c34d51b12187a138e85775433 from https://github.com/raspberrypi/linux.git rpi-6.6.y TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. raspberrypi/linux#2449 raspberrypi/linux#2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org> Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. raspberrypi/linux#2449 raspberrypi/linux#2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. raspberrypi/linux#2449 raspberrypi/linux#2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
Platform/Distro: RPi 3B+ running Arch ARM (armv7h).
Kernel version: 4.14.31 (b36f4e9)
Firmware version: latest as I write this (raspberrypi/firmware@c14a903)
Bug: Frequent kernel oops due to blocked tasks when writing files to NFS mount.
Details: When compiling, dmesg is full of kernel oops like the below when doing so on an NFS mount. Compiling to the micro SD card is fine. I believe that the software (disto) on the micro SD card is NOT to blame... if I put the same micro SD card into a RPi3 or RPi2, I can compile without error.
Again, I am using an NFS mounted partition (/scratch) on which to compile, so I'm hypothesizing that these problems are related to the network driver.
The text was updated successfully, but these errors were encountered: