Skip to content
This repository was archived by the owner on Apr 24, 2019. It is now read-only.

[ST - NUCLEO_F429 + wifi] : not able to connect to device connector when trace is on #223

Closed
adustm opened this issue Mar 29, 2017 · 17 comments

Comments

@adustm
Copy link
Member

adustm commented Mar 29, 2017

Description

  • Type: Bug

Bug

NUCLEO_F429ZI is able to connect to the https://connector.mbed.com/ with a Wifi-ESP module
But if I change the trace in mbed_apps.json, it does not connect to Device connector anymore:

It seems that the trace log file is stacked here :

[DBG ][mClt]: resolve_server_address - Using TCP

[DBG ][mClt]: send_receive_event : _socket_state: ESocketStateConnectBeingCalled, ignoring event
RTX error code: 0x00000002, task ID: 0x2001211C

Version information
mbed-os-example-client (e656db9)
|- easy-connect (6fb5842becae)
| |- atmel-rf-driver (57f22763f4d3)
| |- esp8266-driver (4ed87bf7fe37)
| | - ESP8266\ATParser (269f14532b98)
| |- mcr20a-rf-driver (d8810e105d7d)
| - stm-spirit1-rf-driver (ac7a4f477222)
|- mbed-client (52e65b46dff7)
| |- mbed-client-c (c739b8cbcc57)
| |- mbed-client-classic (b9a521dcd0fc)
| - mbed-client-mbed-tls (7e1b6d815038)
|- mbed-os (f4864dc6429e)
|- pal (4e46c0ea8706)

Environment details
Describe against which environment you are testing.
I am using GCC_ARM, did not test the other toolchains

Expected Behavior
Device can be registered even with trace enabled
Actual Behavior
device doesn't register
Steps to Reproduce
change "mbed-trace.enable": 1 in mbed_apps.json

@MarceloSalazar
Copy link
Contributor

@yogpan01 @JanneKiiskila could you please have a look?
We are working with @adustm and are able to see the problem

@ciarmcom
Copy link
Member

ARM Internal Ref: IOTCLT-1622

@JanneKiiskila
Copy link
Contributor

Looks like we run out of memory somehow now.

@yogpan01
Copy link
Contributor

@MarceloSalazar @0xc0170 Do you know what does this error mean RTX error code: 0x00000002 ? Its coming from mbed OS. This can help us narrow down the issue.

@JanneKiiskila
Copy link
Contributor

JanneKiiskila commented Mar 30, 2017

/* Error Codes */
#define OS_ERR_STK_OVF          1
#define OS_ERR_FIFO_OVF         2
#define OS_ERR_MBX_OVF          3

It's an FIFO Overflow I think. (File: RTX_Conf.h in mbedOS).

That seems to be triggered from one place only,

/*--------------------------- rt_psq_enq ------------------------------------*/

void rt_psq_enq (OS_ID entry, U32 arg) {
  /* Insert post service request "entry" into ps-queue. */
  U32 idx;

  idx = rt_inc_qi (os_psq->size, &os_psq->count, &os_psq->first);
  if (idx < os_psq->size) {
    os_psq->q[idx].id  = entry;
    os_psq->q[idx].arg = arg;
  }
  else {
    os_error (OS_ERR_FIFO_OVF);
  }
}

This is a very common list service, so unfortunately we would need some stack backtraces & debugging to see what's really the list that's running out of memory. But, probably decreasing the # of logs would already help.

@adustm
Copy link
Member Author

adustm commented Mar 30, 2017

Hello,
For your information, if I use UDP socket instead of TCP socket, it works again...
in mbed-os-example-client/simpleclient.h, I simply change:
//Select binding mode: UDP or TCP -- note - Mesh networking is IPv6 UDP ONLY
#ifdef MESH
M2MInterface::BindingMode SOCKET_MODE = M2MInterface::UDP;
#else
// WiFi or Ethernet supports both - TCP by default to avoid
// NAT problems, but UDP will also work - IF you configure
// your network right.
// M2MInterface::BindingMode SOCKET_MODE = M2MInterface::TCP;
M2MInterface::BindingMode SOCKET_MODE = M2MInterface::UDP;

#endif

This modification was suggested by @RonEld . He says that if we do DTLS, we shall use UDP socket. Could you please align internally on this ?
Also @MarceloSalazar does not understand why this works on other platforms in this case... There may be a bug somewhere, but there is certainly a mismatch of use of the tls...

Kind regards

@MarceloSalazar
Copy link
Contributor

Thanks for the feedback @adustm
Just wanted to clarify that the info you provided in your last comment is not directly related to the issue originally reported on this ticket. We'll look into this on a separate thread.

@MarceloSalazar
Copy link
Contributor

@JanneKiiskila thanks for the comments.
We'll investigate and check how we could fix the error or improve the message shown in the console.

@JanneKiiskila
Copy link
Contributor

JanneKiiskila commented Mar 31, 2017

TLS/DTLS - I doubt there's anything "mixed" there - if you use UDP, you use DTLS. If you use TCP - you use TLS - mbedTLS does this automatically and I doubt we can influence that in any way.

Anyway - the memory theory holds anyway - - TCP is a heavier protocol and uses more memory for the re-transmission buffers etc., so it might be a simple case of running out of memory (as the error note also points towards).

@anttiylitokola
Copy link
Contributor

Did some further investigation and looks to also happen with K64F + ESP8266 combination. Disabling traces from mbed-client security implementation solves the problem. Need to do more studying, might be something in esp driver side.

@adustm
Copy link
Member Author

adustm commented Apr 28, 2017

Hello,

The ESP8266 is using UART to communicate with the wifi module.

I just checked on NUCLEO_F429ZI that the ESP pins (PG_9 and PG_14) are not the same pins as the printf pins (PD_8 and PD_9). I've also verified that they are not connected to the same instance of uart.

There must be some conflicts between trace using uart and ESP using uart.

@anttiylitokola
Copy link
Contributor

Hi, This is now some timing issue most probably in the driver side. With following patch client gets stuck even the traces are disabled. With 5ms delay problem occurs every time, changing to 2ms client works fine.
delay.zip
Can someone in the driver side check this a bit more?

@anttiylitokola
Copy link
Contributor

Will close this now. Let's follow this in ESP driver repo ARMmbed/esp8266-driver#28

@geky
Copy link
Contributor

geky commented May 8, 2017

@adustm and @anttiylitokola if you get a chance, are you able to see if commenting out this line solves the issue?
https://github.com/ARMmbed/mbed-client-classic/blob/6b5141649ffb109e5f0a201b0866b6aff795e6d5/source/m2mconnectionhandlerpimpl.cpp#L86

It seems to resolve the problem on my side, although I can't really find any relationship to @anttiylitokola's patch

@adustm
Copy link
Member Author

adustm commented May 9, 2017

Hello @geky
It works for me: NUCLEO_F429ZI + mbed-trace.enable + comment the line 86 of m2mconnectionhandlerpimpl.cpp

Isn't it an issue to call a tr_debug in a callback function ?
Could it be that we are 'writing in UART6 for the ESP8266 commands (then blocking the uart) and in the same time, we call the callback function that writes in UART3 the trace debug ?

Same with the patch from @anttiylitokola ...

I'm not familiar with the ESP driver, but it looks like it is using the serial API with the interrupt mode. Then using many tr_debug calls in the m2m driver may lead to miss some uart interrupts for the wifi driver... ?
what do you think ?
[edit] I just read the comments you wrote in the ARMmbed/esp8266-driver#28. I think we arrived at the same conclusion !

@geky
Copy link
Contributor

geky commented May 9, 2017

@adustm, good to know! sorry about the split in the issue discussion.

Looks like the client guys have a patch up to fix this: ARMmbed/mbed-client-classic#74

@yogpan01, I'll leave it up to you to get the patch merged all the way up here : )

@teetak01
Copy link
Contributor

teetak01 commented May 9, 2017

Disabling the traces seems to be partial fix, but it does not fix the root cause.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants