Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UBLOX_EVK_ODIN_W2: network-wifi test crashes #10815

Closed
kimlep01 opened this issue Jun 12, 2019 · 46 comments · Fixed by #11983
Closed

UBLOX_EVK_ODIN_W2: network-wifi test crashes #10815

kimlep01 opened this issue Jun 12, 2019 · 46 comments · Fixed by #11983

Comments

@kimlep01
Copy link

kimlep01 commented Jun 12, 2019

Description

UBLOX_EVK_ODIN_W2 target crashes in our nightly CI when running network-wifi test. We have not seen this earlier as the test was running inside of a rf shielded box. Now the tests are running in a very noisy environment which most probably causes these crashes when the Odin tries to connect to the network. I can reproduce the crash from line 1826 when I re-run the test inside the test farm. Does not reproduce on my desk (not enough noise I guess).

More common crash:

[1560302096.85][CONN][RXD] >>> Running case #9: 'WIFI-CONNECT-PARAMS-VALID-UNSECURE'...
[1560302096.98][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-PARAMS-VALID-UNSECURE}}, queued...
[1560302134.33][CONN][RXD] 
[1560302134.33][CONN][RXD] 
[1560302134.33][CONN][RXD] ++ MbedOS Error Info ++
[1560302134.43][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1560302134.43][CONN][RXD] Error Message: Assertion failed: msg != NULL
[1560302134.53][CONN][RXD] Location: 0x0
[1560302134.63][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+1826
[1560302134.63][CONN][RXD] Error Value: 0x0
[1560302134.83][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20010E08 Entry: 0x80E44C1 StackSize: 0x1000 StackMem: 0x20010E60 SP: 0x20011D08 
[1560302134.94][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&osver=999999&core=0x410FC241&comp=3&ver=8032001&tgt=UBLOX_EVK_...
[1560302134.94][CONN][RXD] 
[1560302134.94][CONN][RXD] -- MbedOS Error Info --
[1560302134.94][CONN][RXD] 
[1560302135.03][CONN][RXD] = System will be rebooted due to a fatal error =

This crash has happened only in CI runs:

[1560301680.70][CONN][RXD] >>> Running case #9: 'WIFI-CONNECT-PARAMS-VALID-UNSECURE'...
[1560301680.70][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-PARAMS-VALID-UNSECURE}}, queued...
[1560301708.56][CONN][RXD] 
[1560301708.56][CONN][RXD] 
[1560301708.56][CONN][RXD] ++ MbedOS Error Info ++
[1560301708.66][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1560301708.66][CONN][RXD] Error Message: Assertion failed: msg != NULL
[1560301708.66][CONN][RXD] Location: 0x801A1A3
[1560301708.86][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+1596
[1560301708.86][CONN][RXD] Error Value: 0x0
[1560301708.96][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20010890 Entry: 0x8010799 StackSize: 0x1000 StackMem: 0x200108E8 SP: 0x20011490 
[1560301709.16][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x8003010F&osver=999999&core=0x410FC241&comp=1&ver=6110004&tgt=UBLOX_EVK_...
[1560301709.16][CONN][RXD] 
[1560301709.16][CONN][RXD] -- MbedOS Error Info --
[1560301709.16][CONN][RXD] 
[1560301709.16][CONN][RXD] = System will be rebooted due to a fatal error =

Edit - found one more crash:

[1560935800.92][CONN][RXD] >>> Running case #8: 'WIFI-GET-RSSI'...
[1560935801.02][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-GET-RSSI}}, queued...
[1560935823.68][CONN][RXD] 
[1560935823.68][CONN][RXD] 
[1560935823.76][CONN][RXD] ++ MbedOS Error Info ++
[1560935823.76][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1560935823.86][CONN][RXD] Error Message: Assertion failed: msg->type == ODIN_WIFI_MSG_USER_DISCONNECT
[1560935823.86][CONN][RXD] Location: 0x801A1A3
[1560935824.06][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+463
[1560935824.06][CONN][RXD] Error Value: 0x0
[1560935824.16][CONN][RXD] Current Thread: main  Id: 0x2000BA60 Entry: 0x801A791 StackSize: 0x1000 StackMem: 0x200071D8 SP: 0x20008088 
[1560935824.26][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&osver=999999&core=0x410FC241&comp=1&ver=6110004&tgt=UBLOX_EVK_...
[1560935824.26][CONN][RXD] 
[1560935824.36][CONN][RXD] -- MbedOS Error Info --
[1560935824.36][CONN][RXD] 
[1560935824.36][CONN][RXD] = System will be rebooted due to a fatal error =

mbed-os version (master):

commit fc645bc3ed357743d4acf6714564b7f6291c6016 (HEAD -> master, origin/master, origin/HEAD)
Merge: ed69fc1 6df6dec
Author: Anna Bridge <Anna.Bridge@arm.com>
Date:   Tue Jun 11 15:07:43 2019 +0100

    Merge pull request #10805 from kjbracey-arm/armc5_blinky

    Fix ARMC5 compilation

The crashes don't seem to be compiler or test case specific. I have seen these crashes with several different wifi tests cases from this same test suite.

How to reproduce:
Use a VERY noisy test environment for Wifi
Edit tools/test_configs/WiFiInterface.json and give proper values for:
wifi-secure-ssid
wifi-unsecure-ssid
wifi-password
wifi-ch-secure
wifi-ch-unsecure

mbed test --compile -t GCC_ARM -m UBLOX_EVK_ODIN_W2 -n tests-network-wifi --app-config=tools/test_configs/WiFiInterface.json -DMBED_HEAP_STATS_ENABLED=1 -DMBED_STACK_STATS_ENABLED=1 -DMBED_TRAP_ERRORS_ENABLED=1 -DMBED_ALL_STATS_ENABLED -DSKIP_TIME_DRIFT_TESTS=1

mbedgt -m UBLOX_EVK_ODIN_W2 -n tests-network-wifi -V
(crashes 2/3 times in our environment)

Issue request type

[ ] Question
[ ] Enhancement
[X] Bug
@kimlep01
Copy link
Author

cc @ARMmbed/team-ublox

@ciarmcom
Copy link
Member

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-1279

@aqib-ublox
Copy link
Contributor

aqib-ublox commented Jun 24, 2019

@kimlep01 wifi logs.txt

I am not able to reproduce above scnerio even removed two antennas of AP out of three no crash seen

@aqib-ublox
Copy link
Contributor

@kimlep01 have you tried with ARM and IAR also?

I coundn't see it

@kimlep01
Copy link
Author

Yes we run those tests with GCC, ARM and IAR binaries. I have seen crashes with all compilers. I was hoping that those line numbers would help tracking down why those assert failures happen. We have similar issues open with mtb ublox odin w2 also. My guess is that something goes wrong when the board tries to connect to a heavily utilised access point.

@kimlep01
Copy link
Author

kimlep01 commented Aug 26, 2019

Hi @ARMmbed/team-ublox

This issue is still visible in our nightly tests. I have been able to reproduce this issue manually on my desk also:

  • I set my phone (old Samsung S6) as an unsecure Wi-Fi hotspot
  • Place the phone next to the board
  • Comment out all other wifi tests except WIFI-CONNECT-DISCONNECT-REPEAT from /TESTS/network/wifi/main.cpp
  • Compiled tests with IAR 8.32.1

I ran the test a few times and got these three different crashes:

[1566820165.73][CONN][RXD] >>> Running 1 test cases...
[1566820165.79][CONN][RXD]
[1566820165.79][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566820165.85][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1566820165.90][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566820228.01][CONN][RXD]
[1566820228.01][CONN][RXD]
[1566820228.03][CONN][RXD] ++ MbedOS Error Info ++
[1566820228.08][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1566820228.15][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1566820228.16][CONN][RXD] Location: 0x0
[1566820228.31][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+761
[1566820228.32][CONN][RXD] Error Value: 0x0
[1566820228.47][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20008818 Entry: 0x80E3869 StackSize: 0x1000 StackMem: 0x20011990 SP: 0x200128F0
[1566820228.55][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1566820228.58][CONN][RXD] -- MbedOS Error Info --
[1566820228.58][CONN][RXD]
[1566820228.63][CONN][RXD] = System will be rebooted due to a fatal error =

----

[1566821982.25][CONN][RXD] >>> Running 1 test cases...
[1566821982.30][CONN][RXD]
[1566821982.30][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566821982.36][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1566821982.42][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566821994.93][CONN][RXD]
[1566821994.93][CONN][RXD]
[1566821994.96][CONN][RXD] ++ MbedOS Error Info ++
[1566821995.00][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1566821995.08][CONN][RXD] Error Message: Assertion failed: msg->type == ODIN_WIFI_MSG_USER_DISCONNECT
[1566821995.10][CONN][RXD] Location: 0x0
[1566821995.25][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+463
[1566821995.27][CONN][RXD] Error Value: 0x0
[1566821995.37][CONN][RXD] Current Thread: main  Id: 0x2000BA5C Entry: 0x80E91C9 StackSize: 0x1000 StackMem: 0x200052B0 SP: 0x200061B0
[1566821995.46][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1566821995.49][CONN][RXD] -- MbedOS Error Info --
[1566821995.49][CONN][RXD]
[1566821995.55][CONN][RXD] = System will be rebooted due to a fatal error =

----

[1566822170.93][CONN][RXD] >>> Running 1 test cases...
[1566822170.99][CONN][RXD]
[1566822170.99][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566822171.05][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1566822171.10][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566822187.21][CONN][RXD]
[1566822187.21][CONN][RXD]
[1566822187.24][CONN][RXD] ++ MbedOS Error Info ++
[1566822187.28][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1566822187.34][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1566822187.36][CONN][RXD] Location: 0x0
[1566822187.51][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+782
[1566822187.53][CONN][RXD] Error Value: 0x0
[1566822187.66][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20008818 Entry: 0x80E3869 StackSize: 0x1000 StackMem: 0x20011990 SP: 0x20012908
[1566822187.75][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1566822187.78][CONN][RXD] -- MbedOS Error Info --
[1566822187.78][CONN][RXD]
[1566822187.83][CONN][RXD] = System will be rebooted due to a fatal error =

mbed-os commit:

commit 02d1873c06255220faa0c1cabfc3b14696181c89 (HEAD -> master, origin/master, origin/HEAD)
Merge: b005bf2 1ab75d3
Author: Martin Kojtal <martin.kojtal@arm.com>
Date:   Mon Aug 26 11:08:47 2019 +0200
    Merge pull request #11266 from u-blox/feature_adc_temp_conf
    ADC internal temperature support and EMAC header fix

Contents of my mbed_app.json used (just password and unsecure AP name changed from our CI mbed_app.json):

{
 "config": {
   "wifi-secure-ssid": {
     "help": "WiFi SSID for WPA2 secured network",
     "value": "\"4tousnet\""
   },
   "wifi-unsecure-ssid": {
     "help": "WiFi SSID for unsecure netwrok",
     "value": "\"qwerty123\""
   },
   "wifi-password": {
     "help": "WiFi Password",
     "value": "\"secret\""
   },
   "wifi-secure-protocol": {
     "help": "WiFi security protocol, valid values are WEP, WPA, WPA2, WPA/WPA2",
     "value": "\"WPA2\""
   },
   "wifi-ch-secure": {
     "help": "Channel number of secure SSID",
     "value": 10
   },
   "wifi-ch-unsecure": {
     "help": "Channel number of unsecure SSID",
     "value": 2
   },
   "ap-mac-secure": {
     "help": "BSSID of secure AP in form of AA:BB:CC:DD:EE:FF",
     "value": "\"80:2a:a8:8a:e9:13\""
   },
   "ap-mac-unsecure": {
     "help": "BSSID of unsecure AP in form of \"AA:BB:CC:DD:EE:FF\"",
     "value": "\"58:8b:f3:99:c2:08\""
   },
   "max-scan-size": {
     "help": "How many networks may appear in Wifi scan result",
     "value": 30
   },
   "echo-server-addr": {
     "help": "IP address of echo server",
     "value": "\"echo.mbedcloudtesting.com\""
   },
   "echo-server-port": {
     "help": "Port of echo server",
     "value": "7"
   },
   "echo-server-discard-port": {
     "help": "Discard port of echo server",
     "value": "9"
   },
   "echo-server-port-tls": {
     "help": "Port of echo server for TLS",
     "value": "2007"
   },
   "echo-server-discard-port-tls": {
     "help": "Discard port of echo server for TLS",
     "value": "2009"
   }
 },
 "target_overrides": {
   "*": {
     "lwip.ipv4-enabled": true,
     "lwip.ipv6-enabled": false,
     "nsapi.default-wifi-ssid": "MBED_CONF_APP_WIFI_SECURE_SSID",
     "nsapi.default-wifi-password": "MBED_CONF_APP_WIFI_PASSWORD",
     "nsapi.default-wifi-security": "WPA_WPA2",
     "target.components_add": ["FPGA_CI_TEST_SHIELD"]
   },
   "K66F": {
     "events.shared-stacksize": 2048,
     "esp8266.provide-default": true,
     "target.network-default-interface-type": "WIFI",
     "esp8266.tx": "PTC4",
     "esp8266.rx": "PTC3",
     "esp8266.rts": "PTC1",
     "esp8266.cts": "PTC2",
     "esp8266.rst": "D8"
   },
   "NUCLEO_F401RE": {
     "idw0xx1.provide-default": true,
     "idw0xx1.tx": "D8",
     "idw0xx1.rx": "D2",
     "drivers.uart-serial-txbuf-size": 512,
     "drivers.uart-serial-rxbuf-size": 512
   },
   "DISCO_L475VG_IOT01A": {
     "ism43362.provide-default": true,
     "target.network-default-interface-type": "WIFI"
   },
   "SDT64B": {
     "target.components_add": ["SD"],
     "sd.SPI_MOSI": "PTB16",
     "sd.SPI_MISO": "PTB17",
     "sd.SPI_CLK": "PTB11",
     "sd.SPI_CS": "PTB10"
   },
   "UBLOX_EVK_ODIN_W2": {
     "target.network-default-interface-type": "WIFI"
   },
   "GR_LYCHEE": {
     "target.network-default-interface-type": "WIFI"
   }
 }
}

@aqib-ublox
Copy link
Contributor

aqib-ublox commented Aug 26, 2019

@kimlep01 Ok we will generate fix in next PR

@michalpasztamobica
Copy link
Contributor

The error codes identical as in #9621 . The code line has changed. Perhaps that will help?

@aqib-ublox
Copy link
Contributor

aqib-ublox commented Aug 29, 2019

@kimlep01 #11370 generate please verify

@kimlep01
Copy link
Author

kimlep01 commented Aug 29, 2019

Still managed to reproduce the crash. Same mbed_app.json used as above, edited main.cpp to run CONNECT-DISCONNECT-REPEAT 20 times in a row, compiled with IAR 8.32.1 and ran on my desk with my phone as a Wi-Fi hotspot. Crash happened on the third time.

Logs:

[1567072488.16][HTST][INF] CALLBACKs updated
[1567072488.16][HTST][INF] host test detected: default_auto
[1567072488.19][CONN][RXD] >>> Running 20 test cases...
[1567072488.25][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.30][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.36][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.41][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.46][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.52][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.58][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.63][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.68][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.73][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.79][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.85][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.90][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.95][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.00][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.06][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.12][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.17][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.22][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.28][CONN][RXD]
[1567072489.28][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.33][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567072489.40][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072509.63][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;1;0}}, queued...
[1567072509.68][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1567072509.68][CONN][RXD]
[1567072509.75][CONN][RXD] >>> Running case #2: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567072509.81][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072525.13][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;1;0}}, queued...
[1567072525.19][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1567072525.19][CONN][RXD]
[1567072525.25][CONN][RXD] >>> Running case #3: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567072525.30][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072594.28][CONN][RXD]
[1567072594.28][CONN][RXD]
[1567072594.32][CONN][RXD] ++ MbedOS Error Info ++
[1567072594.36][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1567072594.42][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1567072594.44][CONN][RXD] Location: 0x0
[1567072594.58][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+761
[1567072594.59][CONN][RXD] Error Value: 0x0
[1567072594.74][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20008A5C Entry: 0x80DFAD5 StackSize: 0x1000 StackMem: 0x20012300 SP: 0x20013260
[1567072594.82][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1567072594.85][CONN][RXD] -- MbedOS Error Info --
[1567072594.85][CONN][RXD]
[1567072594.90][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
kimlep01@e109827-lin:~/dev/git_repos/mbed-os$ git log -1
commit 7f36fbc86ee54ddcd3e37e7a55681b89409d7533 (HEAD -> ublox-pr-29-08)
Author: aqib-ublox <aqib.nasim@u-blox.com>
Date:   Thu Aug 29 11:33:29 2019 +0500

    Driver Updates + ARMC6 driver support + wifi fixes
kimlep01@e109827-lin:~/dev/git_repos/mbed-os$

@aqib-ublox
Copy link
Contributor

@kimlep01 yes let me reproduce it on my desk and see if can give some fix.
just on side note if you just comment this assertion and one more assertion after cbWLAN_disconnect() in same file to see any more crash occurring.It will helps us, as i can see disconnection is made but wrong status is returning. Because its happening in congested environment where multiple re-connection attempts are made or station not properly disconnected.

@kimlep01
Copy link
Author

Retested the PR with the following changes:

--- i/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp
+++ w/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp
@@ -758,7 +758,7 @@ OdinWiFiInterface::OdinWifiState OdinWiFiInterface::entry_connect_fail_wait_disc
     error_code = cbWLAN_disconnect(handle);
     cbMAIN_driverUnlock();
 
-    MBED_ASSERT(error_code == cbSTATUS_OK);
+    //MBED_ASSERT(error_code == cbSTATUS_OK);
 
     return S_STA_CONNECTION_FAIL_WAIT_DISCONNECT;
 }
@@ -779,7 +779,7 @@ OdinWiFiInterface::OdinWifiState OdinWiFiInterface::entry_wait_disconnect()
     error_code = cbWLAN_disconnect(handle);
     cbMAIN_driverUnlock();
 
-    MBED_ASSERT(error_code == cbSTATUS_OK);
+    //MBED_ASSERT(error_code == cbSTATUS_OK);
 
     return S_STA_WAIT_DISCONNECT;
 }

I could not see any more crashes. Tests would start failing pretty fast sometimes but no more crashing:

[1567154258.51][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567154258.51][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567154258.60][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567154258.60][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567154277.97][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1567154277.97][CONN][RXD]
[1567154277.97][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;1;0}}, queued...
[1567154277.97][CONN][RXD] >>> Running case #2: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567154278.09][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567154355.53][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154355.53][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154355.53][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154355.71][CONN][RXD] :40::FAIL: Expected 0 Was -3004
[1567154355.71][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;0;4}}, queued...
[1567154355.77][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 0 passed, 4 failed with reason 'Test Cases Failed'
[1567154355.77][CONN][RXD]
[1567154355.77][CONN][RXD] >>> Running case #3: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567154355.77][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154355.90][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154355.90][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154355.90][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154356.15][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154356.15][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154356.15][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154356.15][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154356.15][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154356.15][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154356.19][CONN][RXD] :40::FAIL: Expected 0 Was -3004
...all rest of the tests till 20 fail fast to these errors

@aqib-ublox
Copy link
Contributor

aqib-ublox commented Aug 30, 2019

@kimlep01 tests starts failing because of weak signal strength returning no connection status.
Thanks for feedback ,let me find a fix and generate PR soon

@aqib-ublox
Copy link
Contributor

Hi @kimlep01

can u add below hack

`void OdinWiFiInterface::handle_wlan_status_connected(wlan_status_connected_s *wlan_connect)
{
nsapi_error_t error_code;

MBED_ASSERT(wlan_connect != 0);

switch(_state_sta) {
    case S_STA_CONNECTED:
    case S_STA_WAIT_CONNECT:
        _timer.stop();

        if(_debug) {
            printf("MBED_IPSTACK_ \r\n");
        }
        if(_state_sta != S_STA_CONNECTED){
            error_code = _interface->bringup(_dhcp,
                        _ip_address[0] ? _ip_address : 0,
                        _netmask[0] ? _netmask : 0,
                        _gateway[0] ? _gateway : 0,
                        DEFAULT_STACK);
       }

`

in OdinWifiInterface.cpp and test once again it may solve problem, please give feedback i would generate PR after that

@kimlep01
Copy link
Author

kimlep01 commented Sep 6, 2019

Retested with latest mbed-os master with this change:

+++ w/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp
@@ -1305,13 +1305,13 @@ void OdinWiFiInterface::handle_wlan_status_connected(wlan_status_connected_s *wl
             if(_debug) {
                 printf("MBED_IPSTACK_ \r\n");
             }
-
-            error_code = _interface->bringup(_dhcp,
+            if(_state_sta != S_STA_CONNECTED){
+                error_code = _interface->bringup(_dhcp,
                         _ip_address[0] ? _ip_address : 0,
                         _netmask[0] ? _netmask : 0,
                         _gateway[0] ? _gateway : 0,
                         DEFAULT_STACK);
-
+            }

I first made sure I could reproduce the issue still with clean master. I couldn't do that on my desk for some reason so I ran the test in the lab where our nightly tests run also. Test crashed on clean master and with the above change also. The line number was the same in both cases.

Error message:

[1567762678.59][CONN][RXD] :40::FAIL: Expected 0 Was -3004
[1567762678.73][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;0;20}}, queued...
[1567762678.87][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 0 passed, 20 failed with reason 'Test Cases Failed'
[1567762678.87][CONN][RXD]
[1567762678.87][CONN][RXD] >>> Running case #2: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567762678.89][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567762704.00][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567762704.18][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567762729.13][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567762729.13][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567762754.25][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567762754.25][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567762779.41][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567762779.41][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567762804.26][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567762804.26][CONN][RXD] :40::FAIL: Expected 0 Was -3004
[1567762834.14][CONN][RXD]
[1567762834.28][CONN][RXD]
[1567762834.28][CONN][RXD] ++ MbedOS Error Info ++
[1567762834.28][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1567762834.38][CONN][RXD] Error Message: Assertion failed: msg != NULL
[1567762834.38][CONN][RXD] Location: 0x0
[1567762834.49][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+1834
[1567762834.54][CONN][RXD] Error Value: 0x0
[1567762834.67][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x200112A8 Entry: 0x80DFAD5 StackSize: 0x1000 StackMem: 0x200112F8 SP: 0x20012190
[1567762834.83][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1567762834.83][CONN][RXD] -- MbedOS Error Info --
[1567762834.83][CONN][RXD]
[1567762834.83][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!

@aqib-ublox
Copy link
Contributor

Hi @kimlep01 i am sharing private branch for testing please verify above issue:
https://github.com/u-blox/mbed-os/tree/ublox_odin_driver_os_5_v3.7.1_rc2

@kimlep01
Copy link
Author

Can be reproduced with the above mentioned branch:

[1568712135.32][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1568712135.32][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
mbedgt: :38::FAIL: Expected 0 Was -3004
[1568712161.18][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1568712161.18][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1568712185.93][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1568712185.93][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1568712210.70][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1568712210.70][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1568712235.32][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1568712235.32][CONN][RXD] :40::FAIL: Expected 0 Was -3004
[1568712276.31][CONN][RXD]
[1568712276.31][CONN][RXD]
[1568712276.53][CONN][RXD] ++ MbedOS Error Info ++
[1568712276.53][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1568712276.53][CONN][RXD] Error Message: Assertion failed: msg != NULL
[1568712276.53][CONN][RXD] Location: 0x0
[1568712276.70][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+1834
[1568712276.70][CONN][RXD] Error Value: 0x0
[1568712276.90][CONN][RXD] Current Thread: application_unnamed_thread Id: 0x200112A8 Entry: 0x80DFAD5 StackSize: 0x1000 StackMem: 0x200112F8 SP: 0x20012190
[1568712277.13][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1568712277.13][CONN][RXD] -- MbedOS Error Info --
[1568712277.13][CONN][RXD]
[1568712277.13][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
kimlep01@e109827-lin:~/dev/git_repos/U-BLOX/mbed-os$ git log -1
commit 9fbe9716b8ac26df73d990db70c252da446f8355 (HEAD -> ublox_odin_driver_os_5_v3.7.1_rc2, origin/ublox_odin_driver_os_5_v3.7.1_rc2)
Author: aqin <aqib.nasim@u-blox.com>
Date:   Thu Sep 12 09:53:38 2019 +0500

    Update driver to avoid locking issue

@aqib-ublox
Copy link
Contributor

@kimlep01 please see logs below, i am not getting any crash tested it more than 10 times, can u please see if J21 is short and secondly try with replacing other odin_w2 board.
log_try_1.txt

@kimlep01
Copy link
Author

kimlep01 commented Sep 19, 2019

We have 4 boards running in the lab. I tested all of them:
Board 1: asserted at line 1834
Board 2: asserted at line 761
Board 3: asserted at line 1834
Board 4: asserted at line 1834

The boards have jumper J21 shorted. I made a request to the infrastructure team to replace them just to be sure which will take a few days as I don't have access there.

@kimlep01
Copy link
Author

Changing the J21 jumpers did not help and I still could reproduce this both in CI and locally on my desk using the private branch.

I ran the same test (20x connect-disconnect-repeat) with pyOCD debugger attached but all I could see is that on line 758 of OdinWiFiInterface.cpp cbWLAN_disconnect(handle) returned cbSTATUS_ERROR. No visibility beyond that.

[1569409542.72][CONN][RXD] 
[1569409542.74][CONN][RXD] ++ MbedOS Error Info ++
[1569409542.79][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1569409542.85][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1569409542.87][CONN][RXD] Location: 0x80272E7
[1569409543.03][CONN][RXD] File: ./mbed-os/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+762
[1569409543.05][CONN][RXD] Error Value: 0x0
[1569409543.18][CONN][RXD] Current Thread: application_unnamed_thread Id: 0x200064F4 Entry: 0x802B1E9 StackSize: 0x1000 StackMem: 0x20012D40 SP: 0x20013C68 
[1569409543.27][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1569409543.29][CONN][RXD] -- MbedOS Error Info --
[1569409543.30][CONN][RXD] 
[1569409543.34][CONN][RXD] = System will be rebooted due to a fatal error =

This might feel tricky to reproduce sometimes but here is what I suggest:

  • Setup unsecure AP so the board can connect to it normally (phone etc). Place it close to the board.
  • Modify test case to have 20x WIFI-CONNECT-DISCONNECT-REPEAT in a row, nothing else needed.
  • Write a small script to keep running the test case until it asserts. I will eat my hat if it doesn't happen :)

The other way would be to provide a debug version of the driver for us with enough traces so we could help more.

(I will be out of office until 7.10. Team ARMmbed/mbed-os-wan can assist until then if required)

@hmxmghl
Copy link
Member

hmxmghl commented Oct 3, 2019

Hi @kimlep01

Thank you for writing the detailed procedure to reproduce the issue. We were in fact able to reproduce it and have hopefully fixed it.

Hereby sharing our private branch with you: ublox_odin_driver_os_5_v3.7.1_rc3

Can you kindly test and let us know about your test results?

@kimlep01
Copy link
Author

Hello again,

I retested with the new branch and unfortunately I could still reproduce the issue. I tested that I could reproduce the issue with all compilers and using two different boards. I don't get any test case errors prior to assert - it passes just fine until mbed assert happens.

IAR8:

[1570702801.50][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1570702801.50][CONN][RXD]
[1570702801.56][CONN][RXD] >>> Running case #8: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1570702801.62][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1570702805.34][CONN][RXD]
[1570702805.34][CONN][RXD]
[1570702805.36][CONN][RXD] ++ MbedOS Error Info ++
[1570702805.41][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1570702805.47][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1570702805.48][CONN][RXD] Location: 0x0
[1570702805.64][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+782
[1570702805.65][CONN][RXD] Error Value: 0x0
[1570702805.79][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20008A60 Entry: 0x80DFADD StackSize: 0x1000 StackMem: 0x20012310 SP: 0x20013280
[1570702805.87][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1570702805.91][CONN][RXD] -- MbedOS Error Info --
[1570702805.91][CONN][RXD]
[1570702805.96][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
kimlep01@e109827-lin:~/dev/git_repos/U-BLOX/mbed-os$ git log -1
commit cecde0ec81394d5bf8a058b23175f521a9642c8f (HEAD -> ublox_odin_driver_os_5_v3.7.1_rc3, origin/ublox_odin_driver_os_5_v3.7.1_rc3)
Author: Hamza Rizwan <hamza.rizwan66@gmail.com>
Date:   Thu Oct 3 15:15:55 2019 +0500

    Handle Syncronization between EMAC & WLAN Driver + Message Pool Overflow fix

ARMC6:

[1570704403.09][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1570704403.09][CONN][RXD]
[1570704403.15][CONN][RXD] >>> Running case #20: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1570704403.20][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1570704466.18][CONN][RXD]
[1570704466.18][CONN][RXD]
[1570704466.20][CONN][RXD] ++ MbedOS Error Info ++
[1570704466.26][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1570704466.32][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1570704466.34][CONN][RXD] Location: 0x80363F1
[1570704466.48][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+761
[1570704466.50][CONN][RXD] Error Value: 0x0
[1570704466.64][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x200042B8 Entry: 0x800FE99 StackSize: 0x1000 StackMem: 0x20011AF0 SP: 0x20012A28
[1570704466.73][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1570704466.75][CONN][RXD] -- MbedOS Error Info --
[1570704466.75][CONN][RXD]
[1570704466.80][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
kimlep01@e109827-lin:~/dev/git_repos/U-BLOX/mbed-os$ git log -1
commit cecde0ec81394d5bf8a058b23175f521a9642c8f (HEAD -> ublox_odin_driver_os_5_v3.7.1_rc3, origin/ublox_odin_driver_os_5_v3.7.1_rc3)
Author: Hamza Rizwan <hamza.rizwan66@gmail.com>
Date:   Thu Oct 3 15:15:55 2019 +0500

    Handle Syncronization between EMAC & WLAN Driver + Message Pool Overflow fix

GCC_ARM8:

[1570708858.78][CONN][RXD] >>> Running case #2: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1570708858.83][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1570708882.66][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;1;0}}, queued...
[1570708882.72][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1570708882.72][CONN][RXD]
[1570708882.77][CONN][RXD] >>> Running case #3: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1570708882.83][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1570708950.71][CONN][RXD]
[1570708950.71][CONN][RXD]
[1570708950.74][CONN][RXD] ++ MbedOS Error Info ++
[1570708950.79][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1570708950.85][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1570708950.87][CONN][RXD] Location: 0x8020365
[1570708951.02][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+761
[1570708951.04][CONN][RXD] Error Value: 0x0
[1570708951.18][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20006408 Entry: 0x80239E9 StackSize: 0x1000 StackMem: 0x20012A40 SP: 0x20013968
[1570708951.26][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1570708951.29][CONN][RXD] -- MbedOS Error Info --
[1570708951.29][CONN][RXD]
[1570708951.34][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
kimlep01@e109827-lin:~/dev/git_repos/U-BLOX/mbed-os$ git log -1
commit cecde0ec81394d5bf8a058b23175f521a9642c8f (HEAD -> ublox_odin_driver_os_5_v3.7.1_rc3, origin/ublox_odin_driver_os_5_v3.7.1_rc3)
Author: Hamza Rizwan <hamza.rizwan66@gmail.com>
Date:   Thu Oct 3 15:15:55 2019 +0500

    Handle Syncronization between EMAC & WLAN Driver + Message Pool Overflow fix

@hmxmghl
Copy link
Member

hmxmghl commented Oct 11, 2019

Hi @kimlep01

That is strange. After the fix, we ran the test overnight and there was no assert.

Can you please tell us how many times you tries running it?

Can you again try by building clean, hard resetting the board, shorting the J21 jumper, hard resetting the access point (if possible)?

@kimlep01
Copy link
Author

kimlep01 commented Oct 15, 2019

I hit the issue on the first or second run. Now it seems to assert reliably on line 761. I did a clean clone and boards are powered off before test runs. I tested locally with two boards (W260 & W262), changed different jumpers to J21, restarted my phone (S10e) before tests which acted as an unsecure AP.

I also tested inside a RF-BOX with the same outcome (couldn't restart AP there). The mbed_app.json used can be found from my fourth comment (just change unsecure AP name).

Here is the modified test case content, I seem to have increased the timeout also:

diff --git i/TESTS/network/wifi/main.cpp w/TESTS/network/wifi/main.cpp
index 73859eb..73d5a9f 100644
--- i/TESTS/network/wifi/main.cpp
+++ w/TESTS/network/wifi/main.cpp
@@ -54,36 +54,55 @@ utest::v1::status_t test_setup(const size_t number_of_cases)
 #ifdef MBED_GREENTEA_TEST_WIFI_TIMEOUT_S
     GREENTEA_SETUP(MBED_GREENTEA_TEST_WIFI_TIMEOUT_S, "default_auto");
 #else
-    GREENTEA_SETUP(360, "default_auto");
+    GREENTEA_SETUP(4200, "default_auto");
 #endif
     return verbose_test_setup_handler(number_of_cases);
 }

 // Test cases
 Case cases[] = {
-    Case("WIFI-CONSTRUCTOR", wifi_constructor),
-    Case("WIFI-CONNECT-NOCREDENTIALS", wifi_connect_nocredentials),
-    Case("WIFI-SET-CREDENTIAL", wifi_set_credential),
-    Case("WIFI-SET-CHANNEL", wifi_set_channel),
-    Case("WIFI-CONNECT-PARAMS-NULL", wifi_connect_params_null),
-    Case("WIFI-SCAN-NULL", wifi_scan_null),
+    //Case("WIFI-CONSTRUCTOR", wifi_constructor),
+    //Case("WIFI-CONNECT-NOCREDENTIALS", wifi_connect_nocredentials),
+    //Case("WIFI-SET-CREDENTIAL", wifi_set_credential),
+    //Case("WIFI-SET-CHANNEL", wifi_set_channel),
+    //Case("WIFI-CONNECT-PARAMS-NULL", wifi_connect_params_null),
+    //Case("WIFI-SCAN-NULL", wifi_scan_null),
 #if defined(MBED_CONF_APP_WIFI_SECURE_SSID) || defined(MBED_CONF_APP_WIFI_UNSECURE_SSID)
-    Case("WIFI-SCAN", wifi_scan),
+    //Case("WIFI-SCAN", wifi_scan),
 #endif
 #if defined(MBED_CONF_APP_WIFI_UNSECURE_SSID)
-    Case("WIFI-GET-RSSI", wifi_get_rssi),
-    Case("WIFI-CONNECT-PARAMS-VALID-UNSECURE", wifi_connect_params_valid_unsecure),
-    Case("WIFI-CONNECT", wifi_connect),
+    //Case("WIFI-GET-RSSI", wifi_get_rssi),
+    //Case("WIFI-CONNECT-PARAMS-VALID-UNSECURE", wifi_connect_params_valid_unsecure),
+    //Case("WIFI-CONNECT", wifi_connect),
     //Most boards are not passing this test, but they should if they support non-blocking API.
     //Case("WIFI-CONNECT-NONBLOCKING", wifi_connect_nonblock),
     Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
 #endif
 #if defined(MBED_CONF_APP_WIFI_SECURE_SSID)
-    Case("WIFI-CONNECT-PARAMS-VALID-SECURE", wifi_connect_params_valid_secure),
-    Case("WIFI-CONNECT-PARAMS-CHANNEL", wifi_connect_params_channel),
-    Case("WIFI-CONNECT-PARAMS-CHANNEL-FAIL", wifi_connect_params_channel_fail),
-    Case("WIFI-CONNECT-SECURE", wifi_connect_secure),
-    Case("WIFI-CONNECT-SECURE-FAIL", wifi_connect_secure_fail),
+    //Case("WIFI-CONNECT-PARAMS-VALID-SECURE", wifi_connect_params_valid_secure),
+    //Case("WIFI-CONNECT-PARAMS-CHANNEL", wifi_connect_params_channel),
+    //Case("WIFI-CONNECT-PARAMS-CHANNEL-FAIL", wifi_connect_params_channel_fail),
+    //Case("WIFI-CONNECT-SECURE", wifi_connect_secure),
+    //Case("WIFI-CONNECT-SECURE-FAIL", wifi_connect_secure_fail),
 #endif
 };

@hmxmghl
Copy link
Member

hmxmghl commented Oct 16, 2019

Thank you for these insights.

If you hit this on first or second run, that's worrisome. We are barely able to reproduce it anymore. And even if we get the assert, it is after 700-800 connect/disconnect sequences.

We are trying to create the scenario here which causes this assert to occur (like breaking the link in between handshake). So far we have observed that if the Bluetooth of the phone is turned on as well while phone is acting as an AP, it becomes unstable and there are a lot more failed connect attempts (not asserts).

Can you help us identify a low-level scenario in which this can be reproduced? Like sharing the filtered Wireshark logs of packets to and from ODIN on the channel it tries to connect/disconnect.

If you have any other observations that can help us reproduce this will be helpful.

@michalpasztamobica
Copy link
Contributor

Hi @hamza-ubx . Thanks a lot for your efforts so far and the helpful PR, which improved the situation.
@kimlep01 was however still able to spot the assertions (although less often) and I was just checking your suggestion about the LWIP's failure.

I added some more logging, modified the wifi-connect-disconnect-repeat test as Kimmo suggested (2000 loops) and used the unreliable non-RF-BOX RAAS environment to reproduce the issue.

From time to time connect fails with AUTH_FAILURE. I can see this log line in the logs like this:

[1577774895.36][CONN][RXD] TIMEOUT: 20000 ms, ACTUAL 20200 ms
mbedgt: :48::FAIL: Expected 0 Was -3011
[1577774895.46][CONN][RXD] :48::FAIL: Expected 0 Was -3011
mbedgt: :50::FAIL: Expected 0 Was -3004
[1577774895.56][CONN][RXD] :50::FAIL: Expected 0 Was -3004
[1577774916.30][CONN][RXD] TIMEOUT: 20000 ms, ACTUAL 20100 ms
mbedgt: :48::FAIL: Expected 0 Was -3011
[1577774916.30][CONN][RXD] :48::FAIL: Expected 0 Was -3011
mbedgt: :50::FAIL: Expected 0 Was -3004
[1577774916.40][CONN][RXD] :50::FAIL: Expected 0 Was -3004
[1577774940.59][CONN][RXD] TIMEOUT: 20000 ms, ACTUAL 20011 ms
mbedgt: :48::FAIL: Expected 0 Was -3011
[1577774940.59][CONN][RXD] :48::FAIL: Expected 0 Was -3011
mbedgt: :50::FAIL: Expected 0 Was -3004
[1577774940.59][CONN][RXD] :50::FAIL: Expected 0 Was -3004

-3011 is NSAPI_ERROR_AUTH_FAILURE, so I would say it is worth checking what the actual problem is: timeout or authentication failure. In a polluted network both are credible. But I cannot look into the driver to see what exactly happened and they both originate from the driver: the error reason is set here and the timeout callback obviously comes from driver. I would suspect that the translation to mbed's error code is incorrect at some stage.
On each timeout entry_connect_fail_wait_disconnect is entered, but the cbWLAN_disconnect() always returns cbSTATUS_OK and assertion does not trigger.
After that wifi->disconnect() fails with NSAPI_ERROR_NO_CONNECTION, which is reasonable.
This may happen a few times in a row with no assertion, and then the test may continue successfully for many loops more.

Usually however, the next time we try to connect, after previous timeout/authentication error the connect goes into the assertion in entry_connect_fail_wait_disconnect() (entered from the timeout handler, like previously). I do not see why this time cbWLAN_disconnect() returns cbSTATUS_ERROR and I need to seek your help @hamza-ubx to figure this out (or I would need access to driver's code).

To sum up:

  1. I can't see why cbWLAN_disconnect() usually (not always) returns cbSTATUS_ERROR leading to assertion in entry_connect_fail_wait_disconnect().
  2. I also don't see why we can see timeout callback and NSAPI_ERROR_AUTH_FAILURE being reported at the same time. They both somehow originate from driver's internals, so your help would be much appreciated.

I don't think it is LWIP not being able to set ip address that is the root cause. It is OK for the module to sometimes fail to connect to AP in a busy network and this should not lead to assertion.

@hmxmghl
Copy link
Member

hmxmghl commented Jan 1, 2020

Hi @michalpasztamobica

Thank you for the detailed analysis. We are continuing the investigation at our end and will share any information that we discover.

@0xc0170 0xc0170 reopened this Jan 2, 2020
@hmxmghl
Copy link
Member

hmxmghl commented Jan 10, 2020

Hi @michalpasztamobica

So we have investigated the issue in more detail. For the first part where you mentioned that may be translation from driver to Mbed's error code is not correct. When driver returns one of the following, only then it is translated to NSAPI_ERROR_AUTH_FAILURE:

  • cbWLAN_STATUS_DISCONNECTED_AUTH_FAILURE
  • cbWLAN_STATUS_DISCONNECTED_ASSOC_FAILURE
  • cbWLAN_STATUS_DISCONNECTED_MIC_FAILURE

For the connection timeout, its not handled at driver level instead it is being handled at interface level. See here. Hence, we can correct this error translation at interface level.

For the second part where you mentioned that assert is still reproducible but relatively rare now and it mostly occurs when the previous iteration of connect/disconnect fails with -3011. We tried to reproduce this scenario but were unable to replicate. Whenever -3011 occurred, next iteration succeeded usually.

However, assert did occur sometimes around consecutive 800 to 1200 connect/disconnect cycles. But the previous iteration after which the assert occurred was always successful connection/disconnection.

For assert part, we still haven't found a concrete scenario under which it is absolutely reproducible. It just occurs randomly sometimes in 100's and sometimes in 1000's connect/disconnect cycle and that too with only some of the APs (mobile phone access points mostly). Other APs work just fine and no assert have been observed upto 10,000 connect/disconnect cycles. Have you found the concrete scenario under which it can be reproduced?

@michalpasztamobica
Copy link
Contributor

Hi @hamza-ubx . Not sure I got it right, but indeed, sounds like only cbWLAN_STATUS_DISCONNECTED_AUTH_FAILURE should be translated to NSAPI_ERROR_AUTH_FAILURE in here.

What about the two other errors: cbWLAN_STATUS_DISCONNECTED_ASSOC_FAILURE and cbWLAN_STATUS_DISCONNECTED_MIC_FAILURE? Should they be translated to NSAPI_ERROR_TIMEOUT? I don't see any documentation on what they mean... ASSOC could stand for "assosciation" as "access point could not have been associated in time", but the MIC doesn't ring any bell to me... If you have an idea on how to best translate them, would you be able to prepare a PR with a fix? If not, can you please help me understand what these to errors mean and I'll find a NSAPI_ERROR_X code for them.

Regarding the original crash issue...
I haven't tried any more since my last comment. Back then I could reproduce the issue within about an hour using the crowded RAAS lab. It also took me about a 1000 loops (sometimes less sometimes more). I could not crash the board when using my phone as an AP (Samsung Galaxy S7).
The issue somewhat resembles this one: #9621. @aqib-ublox managed to reproduce that issue by rapidly switching his AP on and off. Perhaps that helps to simulate poor connectivity/crowded network?

@hmxmghl
Copy link
Member

hmxmghl commented Jan 13, 2020

I guess we can correct the error translation after we come up with appropriate meaning for each error code and find relevant NSAPI_ERROR_X

For cbWLAN_STATUS_DISCONNECTED_MIC_FAILURE, I think MIC stands for Message Integrity Check. So that is something that we can not categorize in NSAPI_ERROR_AUTH_FAILURE or NSAPI_ERROR_CONNECTION_TIMEOUT, in my opinion. What do you think the correct NSAPI_ERROR_X translation for this would be?

For cbWLAN_STATUS_DISCONNECTED_ASSOC_FAILURE, in connection procedure, as association is done after the authentication step is complete, I don't think cbWLAN_STATUS_DISCONNECTED_ASSOC_FAILURE should be categorized under NSAPI_ERROR_AUTH_FAILURE. But we can argue to put it under NSAPI_ERROR_CONNECTION_TIMEOUT as during association procedure, IP address is obtained so if it fails to obtain an IP within specified time, association failure will occur. What do you think?

For cbWLAN_STATUS_DISCONNECTED_AUTH_FAILURE, it can be put under NSAPI_ERROR_AUTH_FAILURE. Even though, timeout can occur during authentication procedure as well, I think it is still better to categorize it under NSAPI_ERROR_AUTH_FAILURE.

Regarding original crash issue, we are trying different things to reproduce the crash including the one mentioned in #9621. Hopefully this translation correction will also help analyze it better.

@michalpasztamobica

@michalpasztamobica
Copy link
Contributor

Hi @hamza-ubx .
I fully agree with these:
cbWLAN_STATUS_DISCONNECTED_ASSOC_FAILURE -> NSAPI_ERROR_TIMEOUT
cbWLAN_STATUS_DISCONNECTED_AUTH_FAILURE -> NSAPI_ERROR_AUTH_FAILURE

Regarding cbWLAN_STATUS_DISCONNECTED_MIC_FAILURE (Message Integrity Check). Normally if the code doesn't fit anything known, we would translate it to NSAPI_ERROR_DEVICE_ERROR and just leave it to the application to decide how to handle it. But if this issue is not that critical and if it only occurs on (dis)connection, as the handler's name suggests, we could translate it to NSAPI_ERROR_NO_CONNECTION (same as cbWLAN_STATUS_DISCONNECTED_UNKNOWN).

While we are at this point, I am also thinking if cbWLAN_STATUS_DISCONNECTED_NO_BSSID_FOUND should not be translated to NSAPI_ERROR_NO_SSID?

It's also wise to ask @kjbracey-arm for an opinion to make sure we don't get something horribly wrong. Kevin, would you please let us know if these error code translations mentioned above make sense to you?

@hamza-ubx , fingers crossed for the crash fix. If there is anything we can help with - let us know. For the reproduction scenario, I guess we are on the same page - the busier network, the weaker signal and the more extreme its handling and location, the more likely the crash becomes. I guess once per hour is not that bad...

@hmxmghl
Copy link
Member

hmxmghl commented Jan 13, 2020

Great! We will look forward to hear from Kevin and then make the corrections to the translation as discussed.

About the cbWLAN_STATUS_DISCONNECTED_NO_BSSID_FOUND, I think user will see it as misleading if it is translated to NSAPI_ERROR_NO_SSID because it kind of depicts that SSID wasn't found in scan. Right now it is mapped to NSAPI_ERROR_NO_CONNECTION, which in my opinion is more appropriate.

And yeah I guess we are on the same page for the crash issue reproduction. Hoping to find concrete scenario to reproduce and then find a fix.

@michalpasztamobica
Copy link
Contributor

Hi @hamza-ubx , any success regarding the crash issue? @kimlep01 , I looked briefly at the nightly results, but couldn't see the crash in there. Have you applied some workaround? Is the issue still visible?

@kimlep01
Copy link
Author

Hi, the original issue is still visible in our CI. Happened twice last night in GCC_ARM-tests-netsocket-dns and in GCC_ARM tests-network-wifi.

[1584754808.89][CONN][RXD] 
[1584754808.99][CONN][RXD] ++ MbedOS Error Info ++
[1584754808.99][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1584754809.07][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1584754809.07][CONN][RXD] Location: 0x802268D
[1584754809.23][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+733
[1584754809.27][CONN][RXD] Error Value: 0x0
[1584754809.39][CONN][RXD] Current Thread: application_unnamed_thread Id: 0x200065D4 Entry: 0x8026509 StackSize: 0x1000 StackMem: 0x200122F8 SP: 0x20013240 
[1584754809.45][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&osver=999999&core=0x410FC241&comp=2&ver=90200&tgt=UBLOX_EVK_OD...
[1584754809.45][CONN][RXD] 
[1584754809.45][CONN][RXD] -- MbedOS Error Info --
[1584754809.45][CONN][RXD] 
[1584754809.62][CONN][RXD] = System will be rebooted due to a fatal error =

@hmxmghl
Copy link
Member

hmxmghl commented Mar 24, 2020

Hi @michalpasztamobica

We are seeing this issue occur relatively less frequently now since the last optimizations have been applied. However, issue is still there and we would like to properly fix it. This is a tricky one and requires dedicated effort. We are prioritizing it and will keep you posted as soon as we have a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants