-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ALPHA250 intermittent USB failure, potential devicetree configuration issue #618
Comments
Thanks a lot Chris for your report. On the ALPHA250, the USB phy reset pin is unfortunately not exposed to the FPGA. It would be nice if we could reproduce your issue. What do you connect to the USB port? |
Thanks for the reply @jeanminet. For now I've removed the We have an LCD screen attached to the USB port. On another system that occasionally experiences the same issue we have an FTDI USB-to-RS485 adaptor. Earlier this year when I first encountered the issue I tested with a USB-to-ethernet adaptor and a USB thumb drive. All these devices do work at least sometimes, validating that that it's not just e.g. missing drivers (we've modified the kernel build config to include the relevant drivers). We do have an expansion board on the expansion connector that is powered by the 5V, but have seen the issue on systems without anything connected to the expansion connector as well. But it obviously could be shifting the probabilities. When there's next some downtime for the system that demonstrates the issue most often, I plan to look at U-boot's output over the serial monitor at startup, and to try the approach here of modifying the FSBL to change the order PHY pins are initialised: I actually don't know what USB PHY the ALPHA250 is using. It is also a USB3320? |
Just an update on this, the system that had a high failure rate (>50%) now has the USB controller working after every power cycle (probably 15 or so of them) since I made the below devicetree change, except for the very first power cycle after the change: diff -rupN devicetree.orig/zynq-7000.dtsi devicetree/zynq-7000.dtsi
--- devicetree.orig/zynq-7000.dtsi
+++ devicetree/zynq-7000.dtsi
@@ -405,7 +405,6 @@
interrupt-parent = <&intc>;
interrupts = <0 21 4>;
reg = <0xe0002000 0x1000>;
- phy_type = "ulpi";
};
usb1: usb@e0003000 { USB not working still upon that first power cycle is disconcerting, but if we never see the problem again I'll chalk it up to maybe being too rapid a power cycle for capacitors in our power supply to discharge (the power cycle was done via a switch for power to the whole system, not the switch on the ALPH250). I didn't really expect the devicetree change to resolve the problem, so I'll keep an eye out for it still. It's always possible that there was some minor unrelated change to the electrical environment this ALPHA250 is seeing that is just hiding the problem for now. |
Another update: the issue has occurred again, so it looks like the devicetree change wasn't a fix after all. |
Tentatively, I think the FSBL change actually fixes the problem! The suggestion there is to modify the order in which two pins connected to the USB PHY are initialised by the FSBL. I've built a FSBL and --- fsbl.orig/ps7_init.c 2024-12-11 20:06:44.943600527 +1100
+++ fsbl/ps7_init.c 2024-12-11 20:07:17.018891393 +1100
@@ -2893,35 +2893,6 @@
// .. ==> MASK : 0x00002000U VAL : 0x00000000U
// ..
EMIT_MASKWRITE(0XF8000774, 0x00003FFFU ,0x00001205U),
- // .. TRI_ENABLE = 0
- // .. ==> 0XF8000778[0:0] = 0x00000000U
- // .. ==> MASK : 0x00000001U VAL : 0x00000000U
- // .. L0_SEL = 0
- // .. ==> 0XF8000778[1:1] = 0x00000000U
- // .. ==> MASK : 0x00000002U VAL : 0x00000000U
- // .. L1_SEL = 1
- // .. ==> 0XF8000778[2:2] = 0x00000001U
- // .. ==> MASK : 0x00000004U VAL : 0x00000004U
- // .. L2_SEL = 0
- // .. ==> 0XF8000778[4:3] = 0x00000000U
- // .. ==> MASK : 0x00000018U VAL : 0x00000000U
- // .. L3_SEL = 0
- // .. ==> 0XF8000778[7:5] = 0x00000000U
- // .. ==> MASK : 0x000000E0U VAL : 0x00000000U
- // .. Speed = 0
- // .. ==> 0XF8000778[8:8] = 0x00000000U
- // .. ==> MASK : 0x00000100U VAL : 0x00000000U
- // .. IO_Type = 1
- // .. ==> 0XF8000778[11:9] = 0x00000001U
- // .. ==> MASK : 0x00000E00U VAL : 0x00000200U
- // .. PULLUP = 1
- // .. ==> 0XF8000778[12:12] = 0x00000001U
- // .. ==> MASK : 0x00001000U VAL : 0x00001000U
- // .. DisableRcvr = 0
- // .. ==> 0XF8000778[13:13] = 0x00000000U
- // .. ==> MASK : 0x00002000U VAL : 0x00000000U
- // ..
- EMIT_MASKWRITE(0XF8000778, 0x00003FFFU ,0x00001204U),
// .. TRI_ENABLE = 1
// .. ==> 0XF800077C[0:0] = 0x00000001U
// .. ==> MASK : 0x00000001U VAL : 0x00000001U
@@ -3184,6 +3155,35 @@
// ..
EMIT_MASKWRITE(0XF800079C, 0x00003FFFU ,0x00001204U),
// .. TRI_ENABLE = 0
+ // .. ==> 0XF8000778[0:0] = 0x00000000U
+ // .. ==> MASK : 0x00000001U VAL : 0x00000000U
+ // .. L0_SEL = 0
+ // .. ==> 0XF8000778[1:1] = 0x00000000U
+ // .. ==> MASK : 0x00000002U VAL : 0x00000000U
+ // .. L1_SEL = 1
+ // .. ==> 0XF8000778[2:2] = 0x00000001U
+ // .. ==> MASK : 0x00000004U VAL : 0x00000004U
+ // .. L2_SEL = 0
+ // .. ==> 0XF8000778[4:3] = 0x00000000U
+ // .. ==> MASK : 0x00000018U VAL : 0x00000000U
+ // .. L3_SEL = 0
+ // .. ==> 0XF8000778[7:5] = 0x00000000U
+ // .. ==> MASK : 0x000000E0U VAL : 0x00000000U
+ // .. Speed = 0
+ // .. ==> 0XF8000778[8:8] = 0x00000000U
+ // .. ==> MASK : 0x00000100U VAL : 0x00000000U
+ // .. IO_Type = 1
+ // .. ==> 0XF8000778[11:9] = 0x00000001U
+ // .. ==> MASK : 0x00000E00U VAL : 0x00000200U
+ // .. PULLUP = 1
+ // .. ==> 0XF8000778[12:12] = 0x00000001U
+ // .. ==> MASK : 0x00001000U VAL : 0x00001000U
+ // .. DisableRcvr = 0
+ // .. ==> 0XF8000778[13:13] = 0x00000000U
+ // .. ==> MASK : 0x00002000U VAL : 0x00000000U
+ // ..
+ EMIT_MASKWRITE(0XF8000778, 0x00003FFFU ,0x00001204U),
+ // .. TRI_ENABLE = 0
// .. ==> 0XF80007A0[0:0] = 0x00000000U
// .. ==> MASK : 0x00000001U VAL : 0x00000000U
// .. L0_SEL = 0
@@ -7223,35 +7223,6 @@
// .. ==> MASK : 0x00002000U VAL : 0x00000000U
// ..
EMIT_MASKWRITE(0XF8000774, 0x00003FFFU ,0x00001205U),
- // .. TRI_ENABLE = 0
- // .. ==> 0XF8000778[0:0] = 0x00000000U
- // .. ==> MASK : 0x00000001U VAL : 0x00000000U
- // .. L0_SEL = 0
- // .. ==> 0XF8000778[1:1] = 0x00000000U
- // .. ==> MASK : 0x00000002U VAL : 0x00000000U
- // .. L1_SEL = 1
- // .. ==> 0XF8000778[2:2] = 0x00000001U
- // .. ==> MASK : 0x00000004U VAL : 0x00000004U
- // .. L2_SEL = 0
- // .. ==> 0XF8000778[4:3] = 0x00000000U
- // .. ==> MASK : 0x00000018U VAL : 0x00000000U
- // .. L3_SEL = 0
- // .. ==> 0XF8000778[7:5] = 0x00000000U
- // .. ==> MASK : 0x000000E0U VAL : 0x00000000U
- // .. Speed = 0
- // .. ==> 0XF8000778[8:8] = 0x00000000U
- // .. ==> MASK : 0x00000100U VAL : 0x00000000U
- // .. IO_Type = 1
- // .. ==> 0XF8000778[11:9] = 0x00000001U
- // .. ==> MASK : 0x00000E00U VAL : 0x00000200U
- // .. PULLUP = 1
- // .. ==> 0XF8000778[12:12] = 0x00000001U
- // .. ==> MASK : 0x00001000U VAL : 0x00001000U
- // .. DisableRcvr = 0
- // .. ==> 0XF8000778[13:13] = 0x00000000U
- // .. ==> MASK : 0x00002000U VAL : 0x00000000U
- // ..
- EMIT_MASKWRITE(0XF8000778, 0x00003FFFU ,0x00001204U),
// .. TRI_ENABLE = 1
// .. ==> 0XF800077C[0:0] = 0x00000001U
// .. ==> MASK : 0x00000001U VAL : 0x00000001U
@@ -7514,6 +7485,35 @@
// ..
EMIT_MASKWRITE(0XF800079C, 0x00003FFFU ,0x00001204U),
// .. TRI_ENABLE = 0
+ // .. ==> 0XF8000778[0:0] = 0x00000000U
+ // .. ==> MASK : 0x00000001U VAL : 0x00000000U
+ // .. L0_SEL = 0
+ // .. ==> 0XF8000778[1:1] = 0x00000000U
+ // .. ==> MASK : 0x00000002U VAL : 0x00000000U
+ // .. L1_SEL = 1
+ // .. ==> 0XF8000778[2:2] = 0x00000001U
+ // .. ==> MASK : 0x00000004U VAL : 0x00000004U
+ // .. L2_SEL = 0
+ // .. ==> 0XF8000778[4:3] = 0x00000000U
+ // .. ==> MASK : 0x00000018U VAL : 0x00000000U
+ // .. L3_SEL = 0
+ // .. ==> 0XF8000778[7:5] = 0x00000000U
+ // .. ==> MASK : 0x000000E0U VAL : 0x00000000U
+ // .. Speed = 0
+ // .. ==> 0XF8000778[8:8] = 0x00000000U
+ // .. ==> MASK : 0x00000100U VAL : 0x00000000U
+ // .. IO_Type = 1
+ // .. ==> 0XF8000778[11:9] = 0x00000001U
+ // .. ==> MASK : 0x00000E00U VAL : 0x00000200U
+ // .. PULLUP = 1
+ // .. ==> 0XF8000778[12:12] = 0x00000001U
+ // .. ==> MASK : 0x00001000U VAL : 0x00001000U
+ // .. DisableRcvr = 0
+ // .. ==> 0XF8000778[13:13] = 0x00000000U
+ // .. ==> MASK : 0x00002000U VAL : 0x00000000U
+ // ..
+ EMIT_MASKWRITE(0XF8000778, 0x00003FFFU ,0x00001204U),
+ // .. TRI_ENABLE = 0
// .. ==> 0XF80007A0[0:0] = 0x00000000U
// .. ==> MASK : 0x00000001U VAL : 0x00000000U
// .. L0_SEL = 0
@@ -11490,35 +11490,6 @@
// .. ==> MASK : 0x00002000U VAL : 0x00000000U
// ..
EMIT_MASKWRITE(0XF8000774, 0x00003FFFU ,0x00001205U),
- // .. TRI_ENABLE = 0
- // .. ==> 0XF8000778[0:0] = 0x00000000U
- // .. ==> MASK : 0x00000001U VAL : 0x00000000U
- // .. L0_SEL = 0
- // .. ==> 0XF8000778[1:1] = 0x00000000U
- // .. ==> MASK : 0x00000002U VAL : 0x00000000U
- // .. L1_SEL = 1
- // .. ==> 0XF8000778[2:2] = 0x00000001U
- // .. ==> MASK : 0x00000004U VAL : 0x00000004U
- // .. L2_SEL = 0
- // .. ==> 0XF8000778[4:3] = 0x00000000U
- // .. ==> MASK : 0x00000018U VAL : 0x00000000U
- // .. L3_SEL = 0
- // .. ==> 0XF8000778[7:5] = 0x00000000U
- // .. ==> MASK : 0x000000E0U VAL : 0x00000000U
- // .. Speed = 0
- // .. ==> 0XF8000778[8:8] = 0x00000000U
- // .. ==> MASK : 0x00000100U VAL : 0x00000000U
- // .. IO_Type = 1
- // .. ==> 0XF8000778[11:9] = 0x00000001U
- // .. ==> MASK : 0x00000E00U VAL : 0x00000200U
- // .. PULLUP = 1
- // .. ==> 0XF8000778[12:12] = 0x00000001U
- // .. ==> MASK : 0x00001000U VAL : 0x00001000U
- // .. DisableRcvr = 0
- // .. ==> 0XF8000778[13:13] = 0x00000000U
- // .. ==> MASK : 0x00002000U VAL : 0x00000000U
- // ..
- EMIT_MASKWRITE(0XF8000778, 0x00003FFFU ,0x00001204U),
// .. TRI_ENABLE = 1
// .. ==> 0XF800077C[0:0] = 0x00000001U
// .. ==> MASK : 0x00000001U VAL : 0x00000001U
@@ -11781,6 +11752,35 @@
// ..
EMIT_MASKWRITE(0XF800079C, 0x00003FFFU ,0x00001204U),
// .. TRI_ENABLE = 0
+ // .. ==> 0XF8000778[0:0] = 0x00000000U
+ // .. ==> MASK : 0x00000001U VAL : 0x00000000U
+ // .. L0_SEL = 0
+ // .. ==> 0XF8000778[1:1] = 0x00000000U
+ // .. ==> MASK : 0x00000002U VAL : 0x00000000U
+ // .. L1_SEL = 1
+ // .. ==> 0XF8000778[2:2] = 0x00000001U
+ // .. ==> MASK : 0x00000004U VAL : 0x00000004U
+ // .. L2_SEL = 0
+ // .. ==> 0XF8000778[4:3] = 0x00000000U
+ // .. ==> MASK : 0x00000018U VAL : 0x00000000U
+ // .. L3_SEL = 0
+ // .. ==> 0XF8000778[7:5] = 0x00000000U
+ // .. ==> MASK : 0x000000E0U VAL : 0x00000000U
+ // .. Speed = 0
+ // .. ==> 0XF8000778[8:8] = 0x00000000U
+ // .. ==> MASK : 0x00000100U VAL : 0x00000000U
+ // .. IO_Type = 1
+ // .. ==> 0XF8000778[11:9] = 0x00000001U
+ // .. ==> MASK : 0x00000E00U VAL : 0x00000200U
+ // .. PULLUP = 1
+ // .. ==> 0XF8000778[12:12] = 0x00000001U
+ // .. ==> MASK : 0x00001000U VAL : 0x00001000U
+ // .. DisableRcvr = 0
+ // .. ==> 0XF8000778[13:13] = 0x00000000U
+ // .. ==> MASK : 0x00002000U VAL : 0x00000000U
+ // ..
+ EMIT_MASKWRITE(0XF8000778, 0x00003FFFU ,0x00001204U),
+ // .. TRI_ENABLE = 0
// .. ==> 0XF80007A0[0:0] = 0x00000000U
// .. ==> MASK : 0x00000001U VAL : 0x00000000U
// .. L0_SEL = 0 |
We're seeing an issue with the ALPHA250 where the USB2 port sometimes is useable and sometimes is not, probabilistically, upon power cycling. Software reboots don't appear to matter, but a hard power cycle rolls the dice again as to whether the USB port can be used. If the USB port is working after a hard power cycle then it continues to work indefinitely, and if it doesn't work after a hard power cycle, then I haven't figured out a way to get it up (via various USB controller software resets etc) other than hard power cycling until it works.
We always see the controller in dmesg:
But then may or may not see our connected USB devices appear in dmesg. We've seen this over a wide range of devices and multiple ALPHA250s. It may depend on electrical environment, as it seems sometimes things work for a long time, and then maybe we move the device to a different setup and it fails more often. But I can't be sure - it's pretty easy to trick yourself into thinking a random event happens more or less often when the probability hasn't changed.
I've been looking into the problem and wonder if there is an issue with the way Koheron patches the vivado-generated device tree. The devicetree patch for the ALPHA250 contains the following two USB-relevant bits:
pcw.dtsi
:system-top.dts
:However, the patch doesn't touch
zynq-7000.dtsi
, which contains the following node:After patching and building (rendering? compiling?) the devicetree, this all results in the following devicetree node:
i.e. it seems that Koheron's intent was to remove
phy_type = "ulpi";
, and yet it is still there in the final result.Maybe it's a long shot but I'm wondering if this could be the cause of the intermittent USB failures we're seeing. I plan to patch the devicetree further to remove
phy_type = "ulpi"
fromzynq-7000.dtsi
and see if it helps (I can't try this quite now), but thought I'd post here since the modifications made to the devicetree seem deliberate enough that I hope someone who knows more than me might immediately see whether there's a problem.Any help appreciated.
The text was updated successfully, but these errors were encountered: