Updates to network code to improve performance/robustness #32

adamgreen · 2013-08-15T10:12:42Z

I started out looking at some UDP receive code that was only able to
handle 3 inbound 550 byte datagrams out of 16 when sent in quick
succession. I stepped through the ethernet driver code and it
seemed to work as expected but it just couldn't queue up more than
3 PBUFs for each burst. It was almost like it was being starved of
CPU cycles. Based on that observation, I looked up the thread
priorities for the receive ethernet thread and found the following
close to the top of the lpc17_emac.c source file:
#define RX_PRIORITY (osPriorityNormal)
This got me to thinking, what is the priority of the tcp thead? It
turns out that it gets its priority from the following line in
lwipopts.h:
#define TCPIP_THREAD_PRIO 1
Interesting! What priority is 1? It turns out that it corresponds
to osPriorityAboveNormal. This means that while the tcp thread is
handling one packet that has been posted to its mailbox from the
ethernet receive thread, the receive thread is starved from processing
any more inbound ethernet packets.

What happens if we set TCP_IP_THREAD_PRIO to osPriorityNormal? Crash!
The ethernet driver ends up crashing in lpc_low_level_input() when
it tries to set p->len on a NULL p pointer. The p pointer ended up
being NULL because an earlier call to pbuf_alloc() in lpc_rx_queue()
failed its allocation (I will have more to say about this failed
allocation later since that is caused by yet another bug). I pulled a
fix from http://lpcware.com/content/bugtrackerissue/lpc17xx-mac-bugs to
remedy this issue. When the pbuf allocation fails, it discards the
inbound packet in the pbuf and just puts it back into the rx queue.
This means we never end up with a NULL pointer in that queue to
dereference and crash on.

With that bug fixed, the application would just appear to hang after
receiving and processing a few datagrams. I could place breakpoints in
the packet_rx() thread function and found that it was being signalled
by the ethernet ISR but it was always failing to allocate new PBUFs,
which is what led to our previous crash. This means that the new
crash prevention code was just discarding every packet that arrived.

Why are these allocations failing? In my opinion, this was the most
interesting bug to track down. Is there a memory leak somewhere in
the code which maybe only triggers in low memory situations? I
figured the easiest way to determine that would be to learn a bit
about the format of the lwIP heap from which the PBUF was failing to
be allocated. I started by just stepping into the failing lwIP memory
allocator, mem_malloc(). The loop which search the free list starts
with this code:
for (ptr = (mem_size_t)((u8_t *)lfree - ram);
This loop didn't even go through one iteration and when I looked at the
initial ptr value it contained a really large value. It turns out that
lfree was actually lower than ram. At this point I figured that lfree
had probably been corrupted during a free operation after one of the
heap allocations had been underflowed/overflowed to cause the metadata
for an allocation to be corrupted. As I started thinking about how to
track that kind of bug down, I noticed that the ram variable might be
too large (0x20080a68). I restarted the debugger and looked at the
initial value. It was at a nice even address (0x2007c000) and
certainly nothing like what I saw when the allocations were failing.
This global variable shouldn't change at all during the execution of
the program. I placed a memory access watchpoint on this ram variable
and it fired very quickly inside of the rt_mbx_send() function. The
ram variable was being changed by this line in rt_mbx_send():
p_MCB->msg[p_MCB->first] = p_msg;

What the what? Why does writing to the mailbox queue overwrite the
ram global variable? Let's start by looking at the data structure used
in the lwIP port to target RTX (defined in sys_arch.h):
// === MAIL BOX ===

typedef struct {
osMessageQId id;
osMessageQDef_t def;
uint32_t queue[MB_SIZE];
} sys_mbox_t;

Compare that to the utility macro that RTX defines to help setup one of
these mailboxes with queue:
#define osMessageQDef(name, queue_sz, type)
uint32_t os_messageQ_q_##name[4+(queue_sz)];
osMessageQDef_t os_messageQ_def_##name =
{ (queue_sz), (os_messageQ_q_##name) }
Note the 4+(queue_sz) used in the definition of the message queue
array. What a hack! The RTX OS requires an extra 16 bytes to contain
its OS_MCB header and this is how it adds it in. Obviously the
sys_mbox_t structure used in the lwIP OS targetting code doesn't have
this. Without it, the RTX mailbox routines end up scribbling on
memory following the structure in memory. Adding 4 in that structure
fixes the memory allocation failure that I was seeing and now the network
stack can handle between 7 and 10 datagrams within a burst.

I started out looking at some UDP receive code that was only able to handle 3 inbound 550 byte datagrams out of 16 when sent in quick succession. I stepped through the ethernet driver code and it seemed to work as expected but it just couldn't queue up more than 3 PBUFs for each burst. It was almost like it was being starved of CPU cycles. Based on that observation, I looked up the thread priorities for the receive ethernet thread and found the following close to the top of the lpc17_emac.c source file: #define RX_PRIORITY (osPriorityNormal) This got me to thinking, what is the priority of the tcp thead? It turns out that it gets its priority from the following line in lwipopts.h: #define TCPIP_THREAD_PRIO 1 Interesting! What priority is 1? It turns out that it corresponds to osPriorityAboveNormal. This means that while the tcp thread is handling one packet that has been posted to its mailbox from the ethernet receive thread, the receive thread is starved from processing any more inbound ethernet packets. What happens if we set TCP_IP_THREAD_PRIO to osPriorityNormal? Crash! The ethernet driver ends up crashing in lpc_low_level_input() when it tries to set p->len on a NULL p pointer. The p pointer ended up being NULL because an earlier call to pbuf_alloc() in lpc_rx_queue() failed its allocation (I will have more to say about this failed allocation later since that is caused by yet another bug). I pulled a fix from http://lpcware.com/content/bugtrackerissue/lpc17xx-mac-bugs to remedy this issue. When the pbuf allocation fails, it discards the inbound packet in the pbuf and just puts it back into the rx queue. This means we never end up with a NULL pointer in that queue to dereference and crash on. With that bug fixed, the application would just appear to hang after receiving and processing a few datagrams. I could place breakpoints in the packet_rx() thread function and found that it was being signalled by the ethernet ISR but it was always failing to allocate new PBUFs, which is what led to our previous crash. This means that the new crash prevention code was just discarding every packet that arrived. Why are these allocations failing? In my opinion, this was the most interesting bug to track down. Is there a memory leak somewhere in the code which maybe only triggers in low memory situations? I figured the easiest way to determine that would be to learn a bit about the format of the lwIP heap from which the PBUF was failing to be allocated. I started by just stepping into the failing lwIP memory allocator, mem_malloc(). The loop which search the free list starts with this code: for (ptr = (mem_size_t)((u8_t *)lfree - ram); This loop didn't even go through one iteration and when I looked at the initial ptr value it contained a really large value. It turns out that lfree was actually lower than ram. At this point I figured that lfree had probably been corrupted during a free operation after one of the heap allocations had been underflowed/overflowed to cause the metadata for an allocation to be corrupted. As I started thinking about how to track that kind of bug down, I noticed that the ram variable might be too large (0x20080a68). I restarted the debugger and looked at the initial value. It was at a nice even address (0x2007c000) and certainly nothing like what I saw when the allocations were failing. This global variable shouldn't change at all during the execution of the program. I placed a memory access watchpoint on this ram variable and it fired very quickly inside of the rt_mbx_send() function. The ram variable was being changed by this line in rt_mbx_send(): p_MCB->msg[p_MCB->first] = p_msg; What the what? Why does writing to the mailbox queue overwrite the ram global variable? Let's start by looking at the data structure used in the lwIP port to target RTX (defined in sys_arch.h): // === MAIL BOX === typedef struct { osMessageQId id; osMessageQDef_t def; uint32_t queue[MB_SIZE]; } sys_mbox_t; Compare that to the utility macro that RTX defines to help setup one of these mailboxes with queue: #define osMessageQDef(name, queue_sz, type) \ uint32_t os_messageQ_q_##name[4+(queue_sz)]; \ osMessageQDef_t os_messageQ_def_##name = \ { (queue_sz), (os_messageQ_q_##name) } Note the 4+(queue_sz) used in the definition of the message queue array. What a hack! The RTX OS requires an extra 16 bytes to contain its OS_MCB header and this is how it adds it in. Obviously the sys_mbox_t structure used in the lwIP OS targetting code doesn't have this. Without it, the RTX mailbox routines end up scribbling on memory following the structure in memory. Adding 4 in that structure fixes the memory allocation failure that I was seeing and now the network stack can handle between 7 and 10 datagrams within a burst.

bogdanm · 2013-08-15T10:51:41Z

Thanks a lot for this. Excellent investigation and equally nice comment!

Updates to network code to improve performance/robustness

adamgreen · 2013-08-15T11:49:57Z

Thanks!

Fix build dir for uvision and IAR

Coap option tidy

Sm privacy nordic

Update Readme

* Update update-interface.md * Update README.md

Fixes ARMmbed#53 Fixes ARMmbed#32

Fixed IRQ handler again

Update Eddystone to call initRadioNotification

…h PR ARMmbed#32.

Sync with mbed OS 5.2 OOB repo

bogdanm added a commit that referenced this pull request Aug 15, 2013

Merge pull request #32 from adamgreen/netPerfRobustness

ff55aa3

Updates to network code to improve performance/robustness

bogdanm merged commit ff55aa3 into ARMmbed:master Aug 15, 2013

adamgreen deleted the netPerfRobustness branch August 15, 2013 11:50

bridadan pushed a commit that referenced this pull request Jun 21, 2016

Merge pull request #32 from 0xc0170/fix_build_dir

55e6838

Fix build dir for uvision and IAR

SeppoTakalo pushed a commit that referenced this pull request Nov 9, 2016

Merge pull request #32 from ARMmbed/coap-option-tidy

078bd0d

Coap option tidy

costanic mentioned this pull request Sep 11, 2017

SDBlockDevice init failure on mbed-os-5.5 #5068

Closed

pan- pushed a commit to pan-/mbed that referenced this pull request May 10, 2018

Merge pull request ARMmbed#32 from pan-/sm-privacy-nordic

ebb50a6

Sm privacy nordic

geky pushed a commit to geky/mbed that referenced this pull request Aug 25, 2018

Merge pull request ARMmbed#32 from ARMmbed/Readme_update

2d24758

Update Readme

yossi2le pushed a commit to yossi2le/mbed-os that referenced this pull request Jan 2, 2019

Update Documentation (ARMmbed#32)

73c9151

* Update update-interface.md * Update README.md

geky pushed a commit to geky/mbed that referenced this pull request Jan 17, 2019

Add C++ guards to public headers

577d777

Fixes ARMmbed#53 Fixes ARMmbed#32

linlingao pushed a commit to linlingao/mbed-os that referenced this pull request Jul 12, 2019

Merge pull request ARMmbed#32 from yennster/uart-debug

0f22162

Fixed IRQ handler again

yarbcy mentioned this pull request Oct 31, 2019

Cypress: PWM FPGA test wrong assert #11769

Closed

pan- pushed a commit to pan-/mbed that referenced this pull request May 29, 2020

Merge pull request ARMmbed#32 from andresag01/master

4bab972

Update Eddystone to call initRadioNotification

pan- added a commit to pan-/mbed that referenced this pull request May 29, 2020

Merge remote-tracking branch 'master' into oob to avoid conflicts wit…

360c676

…h PR ARMmbed#32.

pan- added a commit to pan-/mbed that referenced this pull request May 29, 2020

Merge pull request ARMmbed#32 from ARMmbed/oob

3cc0038

Sync with mbed OS 5.2 OOB repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updates to network code to improve performance/robustness #32

Updates to network code to improve performance/robustness #32

Uh oh!

adamgreen commented Aug 15, 2013

Uh oh!

bogdanm commented Aug 15, 2013

Uh oh!

adamgreen commented Aug 15, 2013

Uh oh!

Uh oh!

Updates to network code to improve performance/robustness #32

Updates to network code to improve performance/robustness #32

Uh oh!

Conversation

adamgreen commented Aug 15, 2013

Uh oh!

bogdanm commented Aug 15, 2013

Uh oh!

adamgreen commented Aug 15, 2013

Uh oh!

Uh oh!