WR - known issues (all issues combined) #309

alyxazon · 2022-07-19T11:52:07Z

Issue #266: WR node hangs in endless BOOTP loop, when WR PTP link is not established

dietrichb commented on Feb 2, 2021

In rare cases it might happen that a WR node is unable to establish a WR PTP link. In this case the following symptoms are commonly observed.

   node issues a BOOTP request via the network ~once every second
   (BOOTP server replies with IP)
   when reading the relevant register of the WR core, the IP matches the one from the BOOTP server reply: it seems the reply from the BOOTP server has been received by the node
   (I believer the console claims 'in training' for IP)
   the node continues issuing BOOTP requests until the WR PTP link is successfully established

I am not sure if this is an issue with the way how the WR core is instantiated on Arria II devices or of this is an issue of the WR core itself. Should we file this issue on OHWR?

Issue #256: connection between White Rabbbit node and switch unreliable after reboot of WRS

dietrichb commented on Feb 2, 2021 •

a variety of symptoms is observed when rebooting a WRS to which is WR node is connected. This can happen during maintenance, WRS reboot on purpose, or when recovering from a power-cut.

   no White Rabbit lock, occasionally; WRS port claims WA_MSG (waiting for message); node is accessible via the network
   no Ethernet link; rarely; WRS ports claims 'link down'; node inaccessible
   'hang up'; WRS port claims 'WA_MSG' and node MAC is detected by the WRS; node inaccessible via the network

In all cases, power-cycling the WR node helps
In cases '1' and '2' it is usually possible to recover by 'eb-reset' of the node.
In case '1', forcing a sequence port up->down->up on the WRS helps in some cases
In case '2', forcing a sequence port up->down->up on the WRS does not help
In case '3', the node seems to be almost dead. Access to the node is possible neither from the timing network nor from the host system (no chance for eb-reset). Forcing port ->down->up on the WRS does not help. Autorecovery of the WR node via the 'watchdog' implemented on the SCU does not work. A powercycle helps.

Issue #111: WR port not reachable after power cycle of WR switch

dietrichb commented on Dec 15, 2018

symptoms

WRS

   ports shows MAC and ptp state 6 (looks good)

node

   eb-mon shows LINK_UP and TRACKING (looks good)
   node not reachable via timing network (all EB requests time out)

when

   after reboot power cycle of WRS
   it may take a few power cycles of the WRS to trigger the bug

workaround

power cycle or restart FPGA using eb-reset

dietrichb commented on Aug 20, 2019

solved for Arria5 based platforms

requires major work (PHY control update) for Arria II based devices (SCU and VETAR)

Issue #51: WR port of node remains down after power cycle of node AND WR switch

dietrichb commented on Oct 23, 2017
 
There seems to be an annoying bug that seems to occur when a node (SCU) and WRS are switched-on simultaneously after a power cut.

The symptoms are the following

    PPS LED not blinking, activity LED not blinking, link LED off
    eb-mon -v dev/wbm0 shows "LINK_DOWN" and "NO_SYNC"
    eb-console dev/wbm0 causes freezing of the ssh shell
    node fails to get an IP via BOOTP
    (but the WRS shows both "link up" and "activity" LEDs)
    node is not accessible via the WR network
    resetting the FPGA of the node via its Reset controller is possible and cures the symptom.

Suspicion: The FPGA of the node is much faster with "booting" compared to the WRS. It somehow misses to detect "link up" after WRS starts and remains trapped in "link down" state.

This issue is causing real annoyance in cases were major parts of the facility need to be recovered after a major power-cut.

Maybe this is linked to another issue:

dietrichb commented on Aug 20, 2019

solved for Arria 5
not solved for Arria II (SCU and Vetar)

a fix for Arria would require a major effort

dietrichb commented on Feb 2, 2021

update (January 2021): in rare cases this is also observed with fallout gateware

dietrichb · 2024-03-13T20:23:37Z

There is another issue. If a 'fallout' node locks to White Rabbit, it might look at different 'positions' within a 4 ns window. Once locked, it will always remain locked at its initial position.

This issue becomes obvious if one compares two timing receivers (time stamping or digital output) with the same signal. The time difference will remain identical as long as none of the two timing receivers is restarted. But after a restart, the time difference might have a different value. This issue seems to be present for all form factors.

alyxazon mentioned this issue Mar 14, 2024

SERDES - 4 ns random window #357

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WR - known issues (all issues combined) #309

WR - known issues (all issues combined) #309

alyxazon commented Jul 19, 2022

dietrichb commented Mar 13, 2024

WR - known issues (all issues combined) #309

WR - known issues (all issues combined) #309

Comments

alyxazon commented Jul 19, 2022

dietrichb commented Mar 13, 2024