Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110 #3088

Open
GeorgeSapkin opened this issue Jul 19, 2019 · 16 comments
Open
Labels
Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator.

Comments

@GeorgeSapkin
Copy link

GeorgeSapkin commented Jul 19, 2019

Describe the bug
Network connectivity drops out at random on CM3 with a custom carrier board with LAN9512. Device becomes completely unreachable over any network going through that chip (e.g. Ethernet, LTE).

To reproduce
Unknown.

Expected behaviour
Network connectivity doesn't randomly drop out.

Actual behaviour
Network connectivity drops out.

System

  • Which model of Raspberry Pi? CM3 with a custom carrier board with LAN9512
  • Which OS and version (cat /etc/rpi-issue)? Raspberry Pi reference 2018-06-27
  • Which firmware version (vcgencmd version)? VCHI initialization failed
  • Which kernel version (uname -a)? 4.19.42-v7+ system freeze while unplug r8152 usb2network card #1219 SMP Tue May 14 21:20:58 BST 2019 armv7l GNU/Linux

Logs

Jul 18 20:53:50 data kernel: [277401.663895] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Jul 18 20:53:50 data kernel: [277401.663906] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Jul 18 20:53:50 data kernel: [277401.663916] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Jul 18 20:53:50 data kernel: [277401.663926] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR
Jul 18 20:54:56 data kernel: [277466.864971] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x294/0x298
Jul 18 20:54:56 data kernel: [277466.864977] NETDEV WATCHDOG: eth0 (smsc95xx): transmit queue 0 timed out
Jul 18 20:54:56 data kernel: [277466.864982] Modules linked in: tun cdc_ether cdc_acm rtc_ds1307 brcmfmac brcmutil sha256_generic cfg80211 rfkill raspberrypi_hwmon hwmon bcm2835_codec(C) bcm2835_v4l2(C) v4l2_mem2mem bcm2835_mmal_vchiq(C) v4l2_common videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev i2c_bcm2835 media vc_sm_cma(C) uio_pdrv_genirq uio fixed i2c_dev ip_table
s x_tables ipv6
Jul 18 20:54:56 data kernel: [277466.865151] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        4.19.42-v7+ #1219
Jul 18 20:54:56 data kernel: [277466.865154] Hardware name: BCM2835
Jul 18 20:54:56 data kernel: [277466.865173] [<80111eac>] (unwind_backtrace) from [<8010d430>] (show_stack+0x20/0x24)
Jul 18 20:54:56 data kernel: [277466.865184] [<8010d430>] (show_stack) from [<8080ec80>] (dump_stack+0xd4/0x118)
Jul 18 20:54:56 data kernel: [277466.865196] [<8080ec80>] (dump_stack) from [<80120904>] (__warn+0x104/0x11c)
Jul 18 20:54:56 data kernel: [277466.865204] [<80120904>] (__warn) from [<80120974>] (warn_slowpath_fmt+0x58/0x74)
Jul 18 20:54:56 data kernel: [277466.865212] [<80120974>] (warn_slowpath_fmt) from [<807355b8>] (dev_watchdog+0x294/0x298)
Jul 18 20:54:56 data kernel: [277466.865223] [<807355b8>] (dev_watchdog) from [<801965a0>] (call_timer_fn+0x3c/0x198)
Jul 18 20:54:56 data kernel: [277466.865233] [<801965a0>] (call_timer_fn) from [<801967e8>] (expire_timers+0xec/0x14c)
Jul 18 20:54:56 data kernel: [277466.865242] [<801967e8>] (expire_timers) from [<80196900>] (run_timer_softirq+0xb8/0x1ec)
Jul 18 20:54:56 data kernel: [277466.865251] [<80196900>] (run_timer_softirq) from [<80102388>] (__do_softirq+0x190/0x3f0)
Jul 18 20:54:56 data kernel: [277466.865260] [<80102388>] (__do_softirq) from [<801268dc>] (irq_exit+0xfc/0x120)
Jul 18 20:54:56 data kernel: [277466.865271] [<801268dc>] (irq_exit) from [<8017ef24>] (__handle_domain_irq+0x70/0xc4)
Jul 18 20:54:56 data kernel: [277466.865281] [<8017ef24>] (__handle_domain_irq) from [<801021b4>] (bcm2836_arm_irqchip_handle_irq+0x60/0xa4)
Jul 18 20:54:56 data kernel: [277466.865288] [<801021b4>] (bcm2836_arm_irqchip_handle_irq) from [<801019bc>] (__irq_svc+0x5c/0x7c)
Jul 18 20:54:56 data kernel: [277466.865292] Exception stack(0x80d01ea0 to 0x80d01ee8)
Jul 18 20:54:56 data kernel: [277466.865299] 1ea0: bc56838c 00000040 00000000 00000000 bc568380 bc568380 00000040 bc56838c
Jul 18 20:54:56 data kernel: [277466.865305] 1ec0: 00000001 be5ff9c0 80c63a38 80d01efc 80d01f00 80d01ef0 8069fc9c 8082bdf0
Jul 18 20:54:56 data kernel: [277466.865309] 1ee0: a0000013 ffffffff
Jul 18 20:54:56 data kernel: [277466.865318] [<801019bc>] (__irq_svc) from [<8082bdf0>] (_raw_read_lock+0x18/0x38)
Jul 18 20:54:56 data kernel: [277466.865332] [<8082bdf0>] (_raw_read_lock) from [<8069fc9c>] (led_trigger_event+0x30/0x8c)
Jul 18 20:54:56 data kernel: [277466.865341] [<8069fc9c>] (led_trigger_event) from [<806a1664>] (ledtrig_cpu+0xd4/0xf0)
Jul 18 20:54:56 data kernel: [277466.865350] [<806a1664>] (ledtrig_cpu) from [<80109a78>] (arch_cpu_idle_exit+0x1c/0x20)
Jul 18 20:54:56 data kernel: [277466.865361] [<80109a78>] (arch_cpu_idle_exit) from [<801524d0>] (do_idle+0x100/0x17c)
Jul 18 20:54:56 data kernel: [277466.865371] [<801524d0>] (do_idle) from [<8015280c>] (cpu_startup_entry+0x28/0x2c)
Jul 18 20:54:56 data kernel: [277466.865379] [<8015280c>] (cpu_startup_entry) from [<808253e4>] (rest_init+0xbc/0xc0)
Jul 18 20:54:56 data kernel: [277466.865389] [<808253e4>] (rest_init) from [<80c00fb0>] (start_kernel+0x484/0x4b4)

Additional context
I've already checked issues #60, #149, #552 and #746 but none of them really contain anything useful besides mentioning that the issue is already fixed.

nRESET pin on LAN9512 is pulled high and not connected to a GPIO pin.

@lategoodbye
Copy link
Contributor

In order to exclude a possible hardware issue with your custom carrier board, it would be helpful to reproduce this issue on a Raspberry Pi board.

@GeorgeSapkin
Copy link
Author

I cannot reproduce it with the official I/O board since it doesn't have Ethernet. And reproducing this on a full Pi doesn't seem relevant (different hardware). What else can I do to debug this?

@lategoodbye
Copy link
Contributor

lategoodbye commented Jul 22, 2019

What's the difference between your custom IO board and a Raspberry Pi 3 regarding SMSC95xx wiring?
How about providing your device tree source?

@JamesH65 JamesH65 added the Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator. label Jul 30, 2019
@ckamas
Copy link

ckamas commented Jul 30, 2019

I too am seeing this on a CM3 with an SMSC95xx. I too have looked at the other bug reports and was not able to find a solution to this issue.

I am confused about what in the device tree source you are looking for?

Jul 22 16:21:58 RevPi11734 kernel: [ 0.000000] Linux version 4.9.76-rt60-v7+ (admin@kunbus.de) (gcc version 4.8.3 20140303 (prerelease) (crosstool-NG linaro-1.13.1+bzr2650 - Linaro GCC 2014.03) ) #1 SMP PREEMPT RT Tue, 17 Jul 2018 14:20:12 +0200
...
Jul 18 12:21:52 RevPi11734 kernel: [ 3614.344611] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
Jul 18 12:21:52 RevPi11734 kernel: [ 3614.344620] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
Jul 18 12:21:52 RevPi11734 kernel: [ 3614.344626] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
Jul 18 12:21:52 RevPi11734 kernel: [ 3614.344632] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR
...
Jul 18 23:19:11 RevPi11734 kernel: [43053.607537] smsc95xx 1-1.5.1:1.0 eth1: Failed to read reg index 0x00000114: -110
Jul 18 23:19:11 RevPi11734 kernel: [43053.607547] smsc95xx 1-1.5.1:1.0 eth1: Error reading MII_ACCESS
Jul 18 23:19:11 RevPi11734 kernel: [43053.607553] smsc95xx 1-1.5.1:1.0 eth1: MII is busy in smsc95xx_mdio_read
Jul 18 23:19:11 RevPi11734 kernel: [43053.607559] smsc95xx 1-1.5.1:1.0 eth1: Failed to read MII_BMSR

@pelwell
Copy link
Contributor

pelwell commented Jul 30, 2019

-110 is a timeout, but I don't think I can help much more than that. Raspberry Pi boards don't exhibit this problem, and the rare and random nature of the errors makes it look much more like a hardware issue than software (unless it is a very subtle configuration error).

@ckamas
Copy link

ckamas commented Aug 9, 2019

I found a quick test for this issue using iperf. I have a custom board PI CM3L with dual ethernet. I can use iperf on one as a master and one as a slave. See here:
https://serverfault.com/questions/127636/force-local-ip-traffic-to-an-external-interface/861465#861465

Using this the kernel/driver from Rasbian with mods by Kunbus errors out very quickly (<15 mins) with the timeout error as described above, but the version I built with Yocto has been running for several days now with no errors.

The good Yocto version of the kernel is:
Linux raspberrypi-cm3 4.19.57 #1 SMP Thu Jul 11 23:40:23 UTC 2019 armv7l armv7l armv7l GNU/Linux

The Rasbian/Kunbus version is:
Linux RevPi8592 4.9.76-rt60-v7+ #1 SMP PREEMPT RT Tue, 17 Jul 2018 14:20:12 +0200 armv7l GNU/Linux

@GeorgeSapkin
Copy link
Author

GeorgeSapkin commented Aug 30, 2019

We've checked the wiring on our board. We don't have full schematic for RPi3B (AFAIK they are not open-sourced), so we're basing this on Pi1B 2.1 schematic. The difference is that nRESET pin on LAN9512 (we don't need 4 USB ports so we're not using LAN9514) is not wired to any GPIO pin and is pulled high instead. We haven't changed anything in the Devicetree regarding this pin (i.e. LAN_RUN) either. This seems in accordance with LAN9512 specs, Table 2-3. However, according to @pelwell this is an issue. Is this the case and are we misreading LAN9512 specs?

@lategoodbye
Copy link
Contributor

In order to answer your question, we need to know how do you provide the 25 MHz clock source (e.g. via GPIO of BCM2835).

@GeorgeSapkin
Copy link
Author

XI and XO pins of LAN9512 are wired same as in RPiB 2.1 schematic to an external 25 MHz clock source. I'm going to try to connect an unused GPIO pin from CM3 to nRESET on LAN9512, configure it as LAN_RUN in DT and see if I can still reproduce the issue.

@GeorgeSapkin
Copy link
Author

GeorgeSapkin commented Oct 4, 2019

We've made a few test boards with nRESET connected to GPIO pin 13 and configured that pin as LAN_RUN in dt-blob.bin. The issue still occurs. Interestingly, when this happens, Wi-Fi also becomes unresponsive. Wi-Fi chip is connected over SDIO so it doesn't go through LAN9512.

We're currently investigating whether this issue could be related to crosstalk from LED traces.

@xyklex
Copy link

xyklex commented Oct 5, 2019

@GeorgeSapkin I'm having this same issue, I'm building my self the Raspbian operating system with pi-gen with some custom configurations, my last build was by Oct 2, my entire setup is the RPi connected to a ST32 board (it has a serial connection and a storage drive inside it), for testing there is no sensors/gpio connections to the ST32 boards, and for production it has several others devices connected to it.

Testing case works ok, no issues, I am able to mount the storage into linux and all the applications that interact with it just works.

On the other side, production does not works, when I plug in the ST32 board to RPi it gives me this:

[  163.934268] usb 1-1.2: new full-speed USB device number 5 using dwc_otg
[  164.067904] usb 1-1.2: New USB device found, idVendor=0483, idProduct=374b, bcdDevice= 1.00
[  164.067923] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  164.067932] usb 1-1.2: Product: STM32 STLink
[  164.067942] usb 1-1.2: Manufacturer: STMicroelectronics
[  164.067953] usb 1-1.2: SerialNumber: 066CFF383930434B43152720
[  164.075944] usb-storage 1-1.2:1.1: USB Mass Storage device detected
[  164.082453] scsi host0: usb-storage 1-1.2:1.1
[  164.142860] usbcore: registered new interface driver uas
[  164.163874] cdc_acm 1-1.2:1.2: ttyACM0: USB ACM device
[  164.167005] usbcore: registered new interface driver cdc_acm
[  164.167017] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
[  165.124578] ERROR::assign_and_init_hc:1408: assign_and_init_hc: Failed to allocate memory to handle non-dword aligned buffer case

[  170.714285] cdc_acm 1-1.2:1.2: failed to set dtr/rts
[  175.195608] scsi 0:0:0:0: Direct-Access     MBED     microcontroller  1.0  PQ: 0 ANSI: 2
[  175.197179] sd 0:0:0:0: [sda] 4168 512-byte logical blocks: (2.13 MB/2.04 MiB)
[  175.197653] sd 0:0:0:0: [sda] Write Protect is off
[  175.197667] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00
[  175.198153] sd 0:0:0:0: [sda] No Caching mode page found
[  175.198164] sd 0:0:0:0: [sda] Assuming drive cache: write through
[  175.234518] sd 0:0:0:0: Attached scsi generic sg0 type 0
[  175.286560] sd 0:0:0:0: [sda] Attached SCSI removable disk
[  200.074285] cdc_acm 1-1.2:1.2: failed to set dtr/rts
[  205.114283] cdc_acm 1-1.2:1.2: failed to set dtr/rts
[  212.074314] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110
[  212.074331] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS
[  212.074341] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read
[  212.074351] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR

Two ST32 boards has the same firmware, RPi is the same, I tested different power sources (thinking it was that because prod ST32 board has other devices connected).

I have a previous RPi image version that is working on prod though, and it is almost one year old.

This is my report of the situation, I will see if I can rollback to another kernel version and see how it goes.

@GeorgeSapkin
Copy link
Author

Looks like it might be related to #2924
We do have multiple Ethernet adapters: 1 from LAN9512 and 2 from the modem.

@xyklex
Copy link

xyklex commented Oct 8, 2019

@GeorgeSapkin thanks for point me out that issue, my solution right now was to compile a 4.14.107 kernel and modules and that solved the issue I mentioned and another issue I had with the HDMI screen being yellow when the kernel starts. But for now this was my workaround.

@GeorgeSapkin
Copy link
Author

@xyklex I'm glad to hear it solved your issue. We're stress-testing 4.14.x branch as well to see if it's an improvement.

@EloyCrespo
Copy link

@GeorgeSapkin could you tell me if you solved the problem with version 4.14.x?

@wickywaka
Copy link

wickywaka commented Jul 11, 2022

@ckamas We are facing this or very similar issue on our kunbus revpi as well. Did you manage to find a solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator.
Projects
None yet
Development

No branches or pull requests

8 participants