Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AFCv4 - power-up sequence problem #111

Closed
kaolpr opened this issue May 18, 2021 · 6 comments
Closed

AFCv4 - power-up sequence problem #111

kaolpr opened this issue May 18, 2021 · 6 comments
Labels

Comments

@kaolpr
Copy link
Contributor

kaolpr commented May 18, 2021

Description

When board is hot and power cycled MMC can not perform power-up procedure properly (see movie). After a while of cycling the power supplies get steady, however FPGA is not booted nor enabled.

Steps to reproduce

  1. Power on
  2. Heat up to > 50 deg. C
  3. Power cycle

This effect is not fully reproducible, i.e. it happens sometimes (but rather often).

Additional information

XADC reported temperature: 62.6 deg. C

Based on: e95e8d4 (afcv4-port)
Removed hot-swap reporting for logs clarity:

diff --git a/modules/sensors/hotswap.c b/modules/sensors/hotswap.c
index 262aef7..f2b993a 100644
--- a/modules/sensors/hotswap.c
+++ b/modules/sensors/hotswap.c
@@ -138,11 +138,11 @@ void vTaskHotSwap( void *Parameters )
         }
 
         if ( new_state_amc ^ old_state_amc ) {
-            if ( new_state_amc == 0 ) {
-                printf("AMC Hotswap handle pressed!\n");
-            } else {
-                printf("AMC Hotswap handle released!\n");
-            }
+            // if ( new_state_amc == 0 ) {
+            //     printf("AMC Hotswap handle pressed!\n");
+            // } else {
+            //     printf("AMC Hotswap handle released!\n");
+            // }
             if ( hotswap_send_event( hotswap_amc_sensor, new_state_amc ) == ipmb_error_success ) {
                 hotswap_set_mask_bit( HOTSWAP_AMC, 1 << new_state_amc );
                 hotswap_clear_mask_bit( HOTSWAP_AMC, 1 << (!new_state_amc) );

Serial log:

openMMC Starting!
Build date: May 18 2021 15:56:31
Version: v1.4.1-65-ge95e8d4
SHA1: e95e8d40947b9a3998c1cb4954b8de422d6b8531
[FRU][AMC] Asserting FRU information integrity
[FRU][AMC] Error in COMMON HEADER checksum
Could not find a valid FRU information in EEPROM, building a runtime info...
>AMC FRU Information:
        -Board info area:
                -Language Code: 0
                -Manuf time: 13076640
                -Manufacturer: Creotech
                -Name: AMC-FMC-Carrier
                -Serial Number: CNxxxxx
                -Part Number: AFC
                -File ID: AFCFRU
No Chassis info area
No internal use area
        -Product info area:
                -Language Code: 0
                -Manufacturer: Creotech
                -Name: AFC
                -Part Number: AFC:4.0
                -Version: 4.0
                -Asset Tag: Generic FRU
                -Serial Number: CNxxxxx
                -File ID: AFCFRU
        -Multirecord Area: 
                -Module Current: 4 A
                -Zone3 Compatibility code: 0x55667788
>AMC FRU total size: 236 bytes
Enable Power
Enable Power
...
Enable Power

Enable Power is emmited continously, what suggest setDC_DC_ConvertersON is executed and MMC is stuck in PAYLOAD_POWER_GOOD_WAIT state.

Observations and possible explanations

  1. It seems that the problem involves GPIO expander that enables power supplies. I've added monitoring of MCP23016 GP and OLAT registers values:
diff --git a/port/board/afc-v4/payload.c b/port/board/afc-v4/payload.c
index 0bdbe5b..bb54556 100644
--- a/port/board/afc-v4/payload.c
+++ b/port/board/afc-v4/payload.c
@@ -106,14 +106,23 @@ uint8_t setDC_DC_ConvertersON(bool on)
     };
 
     uint8_t pin;
+    uint16_t readout_gp[sizeof(power_pins)], readout_olat[sizeof(power_pins)];
     if (on) {
         printf("Enable Power\n");
 
         for (uint8_t i = 0; i < (sizeof(power_pins) / sizeof(power_pins[0])); i++) {
             pin = power_pins[i];
             mcp23016_write_pin( ext_gpios[pin].port_num, ext_gpios[pin].pin_num, true );
+            mcp23016_read_reg_pair(MCP23016_GP_REG, &readout_gp[i]);
+            mcp23016_read_reg_pair(MCP23016_OLAT_REG, &readout_olat[i]);
             vTaskDelay(10);
         }
+
+        for (uint8_t i = 0; i < (sizeof(power_pins) / sizeof(power_pins[0])); i++) {
+            pin = power_pins[i];
+            printf("(%d) GP:%x PIN:%d/%d OLAT:%x\n", i, readout_gp[i], ext_gpios[pin].port_num, ext_gpios[pin].pin_num, readout_olat[i]);
+        }
+
     } else {
         printf("Disable Power\n");

Serial log:

openMMC Starting!
Build date: May 18 2021 15:56:31
Version: v1.4.1-65-ge95e8d4
SHA1: e95e8d40947b9a3998c1cb4954b8de422d6b8531
[FRU][AMC] Asserting FRU information integrity
[FRU][AMC] Error in COMMON HEADER checksum
Could not find a valid FRU information in EEPROM, building a runtime info...
>AMC FRU Information:
        -Board info area:
                -Language Code: 0
                -Manuf time: 13076640
                -Manufacturer: Creotech
                -Name: AMC-FMC-Carrier
                -Serial Number: CNxxxxx
                -Part Number: AFC
                -File ID: AFCFRU
No Chassis info area
No internal use area
        -Product info area:
                -Language Code: 0
                -Manufacturer: Creotech
                -Name: AFC
                -Part Number: AFC:4.0
                -Version: 4.0
                -Asset Tag: Generic FRU
                -Serial Number: CNxxxxx
                -File ID: AFCFRU
        -Multirecord Area: 
                -Module Current: 4 A
                -Zone3 Compatibility code: 0x55667788
>AMC FRU total size: 236 bytes
Enable Power
(0) GP:0 PIN:0/6 OLAT:4040
(1) GP:0 PIN:1/6 OLAT:0
(2) GP:0 PIN:0/5 OLAT:2020
(3) GP:0 PIN:1/2 OLAT:0
(4) GP:0 PIN:1/1 OLAT:404
(5) GP:0 PIN:1/7 OLAT:606
(6) GP:0 PIN:1/5 OLAT:8686
(7) GP:0 PIN:1/4 OLAT:a6a6
(8) GP:0 PIN:1/0 OLAT:b6b6
(9) GP:0 PIN:1/3 OLAT:b7b7
(10) GP:0 PIN:0/7 OLAT:8080

...

Enable Power
(0) GP:0 PIN:0/6 OLAT:4040
(1) GP:0 PIN:1/6 OLAT:0
(2) GP:0 PIN:0/5 OLAT:2020
(3) GP:0 PIN:1/2 OLAT:0
(4) GP:0 PIN:1/1 OLAT:404
(5) GP:0 PIN:1/7 OLAT:606
(6) GP:0 PIN:1/5 OLAT:8686
(7) GP:0 PIN:1/4 OLAT:a6a6
(8) GP:0 PIN:1/0 OLAT:b6b6
(9) GP:0 PIN:1/3 OLAT:b7b7
(10) GP:0 PIN:0/7 OLAT:8080
Enable Power
(0) GP:0 PIN:0/6 OLAT:4040
(1) GP:0 PIN:1/6 OLAT:0
(2) GP:0 PIN:0/5 OLAT:2020
(3) GP:0 PIN:1/2 OLAT:0
(4) GP:0 PIN:1/1 OLAT:404
(5) GP:0 PIN:1/7 OLAT:606
(6) GP:0 PIN:1/5 OLAT:8686
(7) GP:0 PIN:1/4 OLAT:a6a6
(8) GP:0 PIN:1/0 OLAT:b6b6
(9) GP:bf PIN:1/3 OLAT:b7bf
(10) GP:0 PIN:0/7 OLAT:8080
Enable Power
(0) GP:0 PIN:0/6 OLAT:4040
(1) GP:0 PIN:1/6 OLAT:0
(2) GP:40 PIN:0/5 OLAT:2020
(3) GP:44 PIN:1/2 OLAT:0
(4) GP:46 PIN:1/1 OLAT:46
(5) GP:c6 PIN:1/7 OLAT:c6
(6) GP:e6 PIN:1/5 OLAT:e6
(7) GP:f6 PIN:1/4 OLAT:f6
(8) GP:f7 PIN:1/0 OLAT:f7
(9) GP:ff PIN:1/3 OLAT:ff
(10) GP:ff PIN:0/7 OLAT:80ff
Enable Power
(0) GP:ff PIN:0/6 OLAT:40ff
(1) GP:ff PIN:1/6 OLAT:ff
(2) GP:ff PIN:0/5 OLAT:20ff
(3) GP:ff PIN:1/2 OLAT:ff
(4) GP:ff PIN:1/1 OLAT:ff
(5) GP:ff PIN:1/7 OLAT:ff
(6) GP:ff PIN:1/5 OLAT:ff
(7) GP:ff PIN:1/4 OLAT:ff
(8) GP:ff PIN:1/0 OLAT:ff
(9) GP:ff PIN:1/3 OLAT:ff
(10) GP:ff PIN:0/7 OLAT:80ff
Enable Power
(0) GP:ff PIN:0/6 OLAT:40ff
(1) GP:ff PIN:1/6 OLAT:ff
(2) GP:ff PIN:0/5 OLAT:20ff
(3) GP:ff PIN:1/2 OLAT:ff
(4) GP:ff PIN:1/1 OLAT:ff
(5) GP:ff PIN:1/7 OLAT:ff
(6) GP:ff PIN:1/5 OLAT:ff
(7) GP:ff PIN:1/4 OLAT:ff
(8) GP:ff PIN:1/0 OLAT:ff
(9) GP:ff PIN:1/3 OLAT:ff
(10) GP:ff PIN:0/7 OLAT:80ff
Enable Power
(0) GP:ff PIN:0/6 OLAT:40ff
(1) GP:ff PIN:1/6 OLAT:ff
(2) GP:ff PIN:0/5 OLAT:20ff
(3) GP:ff PIN:1/2 OLAT:ff
(4) GP:ff PIN:1/1 OLAT:ff
(5) GP:ff PIN:1/7 OLAT:ff
(6) GP:ff PIN:1/5 OLAT:ff
(7) GP:ff PIN:1/4 OLAT:ff
(8) GP:ff PIN:1/0 OLAT:ff
(9) GP:ff PIN:1/3 OLAT:ff
(10) GP:ff PIN:0/7 OLAT:80ff
Enable Power
(0) GP:ff PIN:0/6 OLAT:40ff
(1) GP:ff PIN:1/6 OLAT:ff
(2) GP:ff PIN:0/5 OLAT:20ff
(3) GP:ff PIN:1/2 OLAT:ff
(4) GP:ff PIN:1/1 OLAT:ff
(5) GP:ff PIN:1/7 OLAT:ff
(6) GP:ff PIN:1/5 OLAT:ff
(7) GP:ff PIN:1/4 OLAT:ff
(8) GP:ff PIN:1/0 OLAT:ff
(9) GP:ff PIN:1/3 OLAT:ff
(10) GP:ff PIN:0/7 OLAT:80ff
  1. Adding some delay before mcp23016_set_port_dir does not help (in case someone wonders if startup condition for MCP23016 is met).
@kaolpr kaolpr changed the title AFCv4 power-up sequence problem AFCv4 - power-up sequence problem May 18, 2021
@augustofg augustofg added the bug label May 18, 2021
@augustofg
Copy link
Member

We are experiencing the same issue here.

@kaolpr
Copy link
Contributor Author

kaolpr commented May 20, 2021

@Palmitoxico Could you please try this with your setup?

@augustofg
Copy link
Member

This is a little tricky to reproduce, it seems to happen more frequently when inserted into our 2U MicroTCA crate. I'm busy right now, but will try to reproduce as soon as I can.

@kaolpr
Copy link
Contributor Author

kaolpr commented May 21, 2021

Have you monitored board temperature in a crate? 2U is a bit compact, so maybe it just gets hot more easily.

@augustofg
Copy link
Member

I didn't, but I'm having difficulties trying to reproduce this right now (with the older firmware). Even heating the MCP23016 directly with a hot air station isn't triggering the issue.

@augustofg
Copy link
Member

Well, this seems to fix our problems. I will close it for now, but fell free to reopen if you experience the same problem again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants