Skip to content

Ublox odin - Wi-Fi interface fails endurance testing/jams #12497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JanneKiiskila opened this issue Feb 24, 2020 · 13 comments
Closed

Ublox odin - Wi-Fi interface fails endurance testing/jams #12497

JanneKiiskila opened this issue Feb 24, 2020 · 13 comments

Comments

@JanneKiiskila
Copy link
Contributor

JanneKiiskila commented Feb 24, 2020

Description of defect

Try running mbed-os-example-pelion with UBLOX_EVK_ODIN_W2 target, leave it running for a longer time. Once tick count on /3200/0/5501 reaches around 1900-2200 - the connection will fail and all DNS queries against that interface will fail. Seems the Wi-Fi stack gets somehow stuck and it never recovers.

With K66F/Ethernet - you can plug out the whole cable for a long time, then plug it back in and everything recovers nicely.

Target(s) affected by this defect ?

UBLOX_EVK_ODIN_W2

Toolchain(s) (name and version) displaying this defect ?

gcc-arm-none-eabi-9-2019-q4-major

What version of Mbed-os are you using (tag or sha) ?

mbed-os.5.15.1

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

mbed-cli 1.10.1

How is this defect reproduced ?

  1. Use the repository mbed-os-example-pelion or clone it, if you do not have it yet.
  2. Get your developer certificate from Pelion portal.
  3. Configure Wi-Fi credentials to mbed_app.json, optinally turn traces on.
  4. Compile the target - `mbed compile -t GCC_ARM - m UBLOX_EVK_ODIN_W2'
  5. Flash in the binary
  6. Reset board and start following the progress.

-> At some point of time you will have network error/lose Wi-Fi connectivity.

@JanneKiiskila
Copy link
Contributor Author

Attn: @ARMmbed/team-ublox @MarceloSalazar

@JanneKiiskila
Copy link
Contributor Author

JanneKiiskila commented Feb 24, 2020

Log example below;

Counter 283
[INFO][COAP]: ======== Incoming CoAP package ========
[INFO][COAP]: Status:		COAP_STATUS_OK
[INFO][COAP]: Code:		COAP_MSG_CODE_EMPTY
[INFO][COAP]: Type:		COAP_MSG_TYPE_ACKNOWLEDGEMENT
[INFO][COAP]: Id:		25
[INFO][COAP]: Payload len:	0
[INFO][COAP]: ======== End of CoAP package ========
[INFO][mClt]: M2MNsdlInterface::send_resource_observation - uri 3200/0/5501
[INFO][COAP]: ======== Outgoing CoAP package ========
[INFO][COAP]: Status:		COAP_STATUS_OK
[INFO][COAP]: Code:		COAP_MSG_CODE_RESPONSE_CONTENT
[INFO][COAP]: Type:		COAP_MSG_TYPE_CONFIRMABLE
[INFO][COAP]: Id:		26
[INFO][COAP]: Token:		2d:a1:0f:0b
[INFO][COAP]: Content-type:	0
[INFO][COAP]: Payload len:	3
[INFO][COAP]: Max-age:		0
[INFO][COAP]: Observe:		247
[INFO][COAP]: ======== End of CoAP package ========
Counter 284
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 285
[INFO][COAP]: ======== Outgoing CoAP package ========
[INFO][COAP]: Status:		COAP_STATUS_OK
[INFO][COAP]: Code:		COAP_MSG_CODE_RESPONSE_CONTENT
[INFO][COAP]: Type:		COAP_MSG_TYPE_CONFIRMABLE
[INFO][COAP]: Id:		26
[INFO][COAP]: Token:		2d:a1:0f:0b
[INFO][COAP]: Content-type:	0
[INFO][COAP]: Payload len:	3
[INFO][COAP]: Max-age:		0
[INFO][COAP]: Observe:		247
[INFO][COAP]: ======== End of CoAP package ========
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 286
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 287
[INFO][COAP]: ======== Outgoing CoAP package ========
[INFO][COAP]: Status:		COAP_STATUS_OK
[INFO][COAP]: Code:		COAP_MSG_CODE_RESPONSE_CONTENT
[INFO][COAP]: Type:		COAP_MSG_TYPE_CONFIRMABLE
[INFO][COAP]: Id:		26
[INFO][COAP]: Token:		2d:a1:0f:0b
[INFO][COAP]: Content-type:	0
[INFO][COAP]: Payload len:	3
[INFO][COAP]: Max-age:		0
[INFO][COAP]: Observe:		247
[INFO][COAP]: ======== End of CoAP package ========
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 288
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 289
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 290
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 291
[INFO][COAP]: ======== Outgoing CoAP package ========
[INFO][COAP]: Status:		COAP_STATUS_OK
[INFO][COAP]: Code:		COAP_MSG_CODE_RESPONSE_CONTENT
[INFO][COAP]: Type:		COAP_MSG_TYPE_CONFIRMABLE
[INFO][COAP]: Id:		26
[INFO][COAP]: Token:		2d:a1:0f:0b
[INFO][COAP]: Content-type:	0
[INFO][COAP]: Payload len:	3
[INFO][COAP]: Max-age:		0
[INFO][COAP]: Observe:		247
[INFO][COAP]: ======== End of CoAP package ========
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 292
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 293
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 294
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 295
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 296
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 297
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 298
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 299
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 300
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 301
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 302
[INFO][mClt]: M2MNsdlInterface::received_from_server_callback - message sending failed, id 26
[ERR ][mClt]: M2MInterfaceImpl::registration_error code [6]
[ERR ][mClt]: M2MInterfaceImpl::socket_error: (-8), retry (1), reconnecting (0), reconnection_state (1)
[INFO][mClt]: M2MInterfaceImpl::create_random_initial_reconnection_time() initial random time 34

[INFO][mClt]: M2MInterfaceImpl::socket_error - reconnecting in 34(s)
[ERR ][mClt]: ConnectorClient::error() - error: 6
[INFO][fcc ]: storage_pal_sst.c:602:storage_item_get_data:===> item name = mbed.UseBootstrap len = 17, data max size = 4
[INFO][fcc ]: storage_pal_sst.c:640:storage_item_get_data:<=== 
[ERR ][mClt]: ServiceClient::connector_error() error 6
[ERR ][mClt]: MbedCloudClient::error code (6)
client_error(6) -> Client in reconnection mode NetworkError
[ERR ][mClt]: ServiceClient::state_failure()
[INFO][mClt]: MbedCloudClient::complete status (-1)

I.e. the notification sending fails - client tries to reconnect, but it just does not help and connection never recovers. Counter was only 283 in this case, so it happened quite quicly that time (I had a total of 4 test runs over the weekend, every one of them failed).

For reference: NXP K66F worked flawlessly the whole weekend, even with the Ethernet cable intentionally plugged out for a while.

@JanneKiiskila
Copy link
Contributor Author

Same story with mbed-cloud-client-example, log from that, too.

Counter 283
[INFO][COAP]: ======== Incoming CoAP package ========
[INFO][COAP]: Status:		COAP_STATUS_OK
[INFO][COAP]: Code:		COAP_MSG_CODE_EMPTY
[INFO][COAP]: Type:		COAP_MSG_TYPE_ACKNOWLEDGEMENT
[INFO][COAP]: Id:		25
[INFO][COAP]: Payload len:	0
[INFO][COAP]: ======== End of CoAP package ========
[INFO][mClt]: M2MNsdlInterface::send_resource_observation - uri 3200/0/5501
[INFO][COAP]: ======== Outgoing CoAP package ========
[INFO][COAP]: Status:		COAP_STATUS_OK
[INFO][COAP]: Code:		COAP_MSG_CODE_RESPONSE_CONTENT
[INFO][COAP]: Type:		COAP_MSG_TYPE_CONFIRMABLE
[INFO][COAP]: Id:		26
[INFO][COAP]: Token:		2d:a1:0f:0b
[INFO][COAP]: Content-type:	0
[INFO][COAP]: Payload len:	3
[INFO][COAP]: Max-age:		0
[INFO][COAP]: Observe:		247
[INFO][COAP]: ======== End of CoAP package ========
Counter 284
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 285
[INFO][COAP]: ======== Outgoing CoAP package ========
[INFO][COAP]: Status:		COAP_STATUS_OK
[INFO][COAP]: Code:		COAP_MSG_CODE_RESPONSE_CONTENT
[INFO][COAP]: Type:		COAP_MSG_TYPE_CONFIRMABLE
[INFO][COAP]: Id:		26
[INFO][COAP]: Token:		2d:a1:0f:0b
[INFO][COAP]: Content-type:	0
[INFO][COAP]: Payload len:	3
[INFO][COAP]: Max-age:		0
[INFO][COAP]: Observe:		247
[INFO][COAP]: ======== End of CoAP package ========
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 286
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 287
[INFO][COAP]: ======== Outgoing CoAP package ========
[INFO][COAP]: Status:		COAP_STATUS_OK
[INFO][COAP]: Code:		COAP_MSG_CODE_RESPONSE_CONTENT
[INFO][COAP]: Type:		COAP_MSG_TYPE_CONFIRMABLE
[INFO][COAP]: Id:		26
[INFO][COAP]: Token:		2d:a1:0f:0b
[INFO][COAP]: Content-type:	0
[INFO][COAP]: Payload len:	3
[INFO][COAP]: Max-age:		0
[INFO][COAP]: Observe:		247
[INFO][COAP]: ======== End of CoAP package ========
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 288
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 289
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 290
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 291
[INFO][COAP]: ======== Outgoing CoAP package ========
[INFO][COAP]: Status:		COAP_STATUS_OK
[INFO][COAP]: Code:		COAP_MSG_CODE_RESPONSE_CONTENT
[INFO][COAP]: Type:		COAP_MSG_TYPE_CONFIRMABLE
[INFO][COAP]: Id:		26
[INFO][COAP]: Token:		2d:a1:0f:0b
[INFO][COAP]: Content-type:	0
[INFO][COAP]: Payload len:	3
[INFO][COAP]: Max-age:		0
[INFO][COAP]: Observe:		247
[INFO][COAP]: ======== End of CoAP package ========
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 292
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 293
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 294
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 295
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 296
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 297
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 298
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 299
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 300
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 301
[INFO][mClt]: M2MNsdlInterface::observation_to_be_sent() - send already in progress
Counter 302
[INFO][mClt]: M2MNsdlInterface::received_from_server_callback - message sending failed, id 26
[ERR ][mClt]: M2MInterfaceImpl::registration_error code [6]
[ERR ][mClt]: M2MInterfaceImpl::socket_error: (-8), retry (1), reconnecting (0), reconnection_state (1)
[INFO][mClt]: M2MInterfaceImpl::create_random_initial_reconnection_time() initial random time 34

[INFO][mClt]: M2MInterfaceImpl::socket_error - reconnecting in 34(s)
[ERR ][mClt]: ConnectorClient::error() - error: 6
[INFO][fcc ]: storage_pal_sst.c:602:storage_item_get_data:===> item name = mbed.UseBootstrap len = 17, data max size = 4
[INFO][fcc ]: storage_pal_sst.c:640:storage_item_get_data:<=== 
[ERR ][mClt]: ServiceClient::connector_error() error 6
[ERR ][mClt]: MbedCloudClient::error code (6)
client_error(6) -> Client in reconnection mode NetworkError

Counter is at that same 283, so that seems like too much of a co-incidence?

@ciarmcom
Copy link
Member

Internal Jira reference: https://jira.arm.com/browse/MBOTRIAGE-2559

@se7ensong
Copy link

I have seen this issue too. Thank you @JanneKiiskila goes into the detail.

I actually have seen this issue on mebd-os-5.14.2 and it seems that ARP packet cannot be received after running for a while.

By connecting the ublox to the wifi hotspot on my laptop and use Wireshark for packet capture, I traced things back to the Ublox driver. However, since it's closed source, I cannot proceed anymore. The LWIP trace shows that Ublox requires ARP at 270 seconds (30 seconds before ARP table expires) and it usually gets a response to renew the table. However, after a while, the packet capture shows the gateway(my laptop) provided an ARP response but it wasn't received by the LWIP stack although SSDP packet was received without any issue. Hence, no packet can be sent.

I temporarily goes around the issue by enabling static ARP but would like to see a much more reliable ublox driver.

@JanneKiiskila
Copy link
Contributor Author

@ARMmbed/team-ublox - please comment / analyse this issue.

@aqib-ublox
Copy link
Contributor

This issue is just seems duplicate of
#10815

we would resume analysis on it and would see how we may fix in driver.

for wifi we have analyzed and turning off DHCP and switching to static IP's avoid a crash.

@MarceloSalazar
Copy link

Closing as it's a duplicate of #10815

@JanneKiiskila
Copy link
Contributor Author

This is not a crash, this is a jam. So, I don't see the problem symptom being the same necessarily - there is no Mbed OS crash.

@JanneKiiskila
Copy link
Contributor Author

Reproducing this issue not require any noisy Wi-Fi environment either, just enough time (2-3 hours).

@MarceloSalazar
Copy link

@JanneKiiskila thanks for clarifying this.
I'm reopening for @ARMmbed/team-ublox to investigate if they can.

@MarceloSalazar MarceloSalazar reopened this Apr 8, 2020
@aqib-ublox
Copy link
Contributor

@JanneKiiskila Are you getting device running into JAM condition without DHCP if practice for 2-3 hours?

@JanneKiiskila
Copy link
Contributor Author

I have not tried with static IP addresses, it's not a realistic scenario for many setups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants