-
Notifications
You must be signed in to change notification settings - Fork 3k
Updates to network code to improve performance/robustness #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I started out looking at some UDP receive code that was only able to handle 3 inbound 550 byte datagrams out of 16 when sent in quick succession. I stepped through the ethernet driver code and it seemed to work as expected but it just couldn't queue up more than 3 PBUFs for each burst. It was almost like it was being starved of CPU cycles. Based on that observation, I looked up the thread priorities for the receive ethernet thread and found the following close to the top of the lpc17_emac.c source file: #define RX_PRIORITY (osPriorityNormal) This got me to thinking, what is the priority of the tcp thead? It turns out that it gets its priority from the following line in lwipopts.h: #define TCPIP_THREAD_PRIO 1 Interesting! What priority is 1? It turns out that it corresponds to osPriorityAboveNormal. This means that while the tcp thread is handling one packet that has been posted to its mailbox from the ethernet receive thread, the receive thread is starved from processing any more inbound ethernet packets. What happens if we set TCP_IP_THREAD_PRIO to osPriorityNormal? Crash! The ethernet driver ends up crashing in lpc_low_level_input() when it tries to set p->len on a NULL p pointer. The p pointer ended up being NULL because an earlier call to pbuf_alloc() in lpc_rx_queue() failed its allocation (I will have more to say about this failed allocation later since that is caused by yet another bug). I pulled a fix from http://lpcware.com/content/bugtrackerissue/lpc17xx-mac-bugs to remedy this issue. When the pbuf allocation fails, it discards the inbound packet in the pbuf and just puts it back into the rx queue. This means we never end up with a NULL pointer in that queue to dereference and crash on. With that bug fixed, the application would just appear to hang after receiving and processing a few datagrams. I could place breakpoints in the packet_rx() thread function and found that it was being signalled by the ethernet ISR but it was always failing to allocate new PBUFs, which is what led to our previous crash. This means that the new crash prevention code was just discarding every packet that arrived. Why are these allocations failing? In my opinion, this was the most interesting bug to track down. Is there a memory leak somewhere in the code which maybe only triggers in low memory situations? I figured the easiest way to determine that would be to learn a bit about the format of the lwIP heap from which the PBUF was failing to be allocated. I started by just stepping into the failing lwIP memory allocator, mem_malloc(). The loop which search the free list starts with this code: for (ptr = (mem_size_t)((u8_t *)lfree - ram); This loop didn't even go through one iteration and when I looked at the initial ptr value it contained a really large value. It turns out that lfree was actually lower than ram. At this point I figured that lfree had probably been corrupted during a free operation after one of the heap allocations had been underflowed/overflowed to cause the metadata for an allocation to be corrupted. As I started thinking about how to track that kind of bug down, I noticed that the ram variable might be too large (0x20080a68). I restarted the debugger and looked at the initial value. It was at a nice even address (0x2007c000) and certainly nothing like what I saw when the allocations were failing. This global variable shouldn't change at all during the execution of the program. I placed a memory access watchpoint on this ram variable and it fired very quickly inside of the rt_mbx_send() function. The ram variable was being changed by this line in rt_mbx_send(): p_MCB->msg[p_MCB->first] = p_msg; What the what? Why does writing to the mailbox queue overwrite the ram global variable? Let's start by looking at the data structure used in the lwIP port to target RTX (defined in sys_arch.h): // === MAIL BOX === typedef struct { osMessageQId id; osMessageQDef_t def; uint32_t queue[MB_SIZE]; } sys_mbox_t; Compare that to the utility macro that RTX defines to help setup one of these mailboxes with queue: #define osMessageQDef(name, queue_sz, type) \ uint32_t os_messageQ_q_##name[4+(queue_sz)]; \ osMessageQDef_t os_messageQ_def_##name = \ { (queue_sz), (os_messageQ_q_##name) } Note the 4+(queue_sz) used in the definition of the message queue array. What a hack! The RTX OS requires an extra 16 bytes to contain its OS_MCB header and this is how it adds it in. Obviously the sys_mbox_t structure used in the lwIP OS targetting code doesn't have this. Without it, the RTX mailbox routines end up scribbling on memory following the structure in memory. Adding 4 in that structure fixes the memory allocation failure that I was seeing and now the network stack can handle between 7 and 10 datagrams within a burst.
Thanks a lot for this. Excellent investigation and equally nice comment! |
bogdanm
added a commit
that referenced
this pull request
Aug 15, 2013
Updates to network code to improve performance/robustness
Thanks! |
bridadan
pushed a commit
that referenced
this pull request
Jun 21, 2016
Fix build dir for uvision and IAR
yossi2le
pushed a commit
to yossi2le/mbed-os
that referenced
this pull request
Jan 2, 2019
* Update update-interface.md * Update README.md
geky
pushed a commit
to geky/mbed
that referenced
this pull request
Jan 17, 2019
Fixes ARMmbed#53 Fixes ARMmbed#32
linlingao
pushed a commit
to linlingao/mbed-os
that referenced
this pull request
Jul 12, 2019
Fixed IRQ handler again
pan-
pushed a commit
to pan-/mbed
that referenced
this pull request
May 29, 2020
Update Eddystone to call initRadioNotification
pan-
added a commit
to pan-/mbed
that referenced
this pull request
May 29, 2020
pan-
added a commit
to pan-/mbed
that referenced
this pull request
May 29, 2020
Sync with mbed OS 5.2 OOB repo
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I started out looking at some UDP receive code that was only able to
handle 3 inbound 550 byte datagrams out of 16 when sent in quick
succession. I stepped through the ethernet driver code and it
seemed to work as expected but it just couldn't queue up more than
3 PBUFs for each burst. It was almost like it was being starved of
CPU cycles. Based on that observation, I looked up the thread
priorities for the receive ethernet thread and found the following
close to the top of the lpc17_emac.c source file:
#define RX_PRIORITY (osPriorityNormal)
This got me to thinking, what is the priority of the tcp thead? It
turns out that it gets its priority from the following line in
lwipopts.h:
#define TCPIP_THREAD_PRIO 1
Interesting! What priority is 1? It turns out that it corresponds
to osPriorityAboveNormal. This means that while the tcp thread is
handling one packet that has been posted to its mailbox from the
ethernet receive thread, the receive thread is starved from processing
any more inbound ethernet packets.
What happens if we set TCP_IP_THREAD_PRIO to osPriorityNormal? Crash!
The ethernet driver ends up crashing in lpc_low_level_input() when
it tries to set p->len on a NULL p pointer. The p pointer ended up
being NULL because an earlier call to pbuf_alloc() in lpc_rx_queue()
failed its allocation (I will have more to say about this failed
allocation later since that is caused by yet another bug). I pulled a
fix from http://lpcware.com/content/bugtrackerissue/lpc17xx-mac-bugs to
remedy this issue. When the pbuf allocation fails, it discards the
inbound packet in the pbuf and just puts it back into the rx queue.
This means we never end up with a NULL pointer in that queue to
dereference and crash on.
With that bug fixed, the application would just appear to hang after
receiving and processing a few datagrams. I could place breakpoints in
the packet_rx() thread function and found that it was being signalled
by the ethernet ISR but it was always failing to allocate new PBUFs,
which is what led to our previous crash. This means that the new
crash prevention code was just discarding every packet that arrived.
Why are these allocations failing? In my opinion, this was the most
interesting bug to track down. Is there a memory leak somewhere in
the code which maybe only triggers in low memory situations? I
figured the easiest way to determine that would be to learn a bit
about the format of the lwIP heap from which the PBUF was failing to
be allocated. I started by just stepping into the failing lwIP memory
allocator, mem_malloc(). The loop which search the free list starts
with this code:
for (ptr = (mem_size_t)((u8_t *)lfree - ram);
This loop didn't even go through one iteration and when I looked at the
initial ptr value it contained a really large value. It turns out that
lfree was actually lower than ram. At this point I figured that lfree
had probably been corrupted during a free operation after one of the
heap allocations had been underflowed/overflowed to cause the metadata
for an allocation to be corrupted. As I started thinking about how to
track that kind of bug down, I noticed that the ram variable might be
too large (0x20080a68). I restarted the debugger and looked at the
initial value. It was at a nice even address (0x2007c000) and
certainly nothing like what I saw when the allocations were failing.
This global variable shouldn't change at all during the execution of
the program. I placed a memory access watchpoint on this ram variable
and it fired very quickly inside of the rt_mbx_send() function. The
ram variable was being changed by this line in rt_mbx_send():
p_MCB->msg[p_MCB->first] = p_msg;
What the what? Why does writing to the mailbox queue overwrite the
ram global variable? Let's start by looking at the data structure used
in the lwIP port to target RTX (defined in sys_arch.h):
// === MAIL BOX ===
typedef struct {
osMessageQId id;
osMessageQDef_t def;
uint32_t queue[MB_SIZE];
} sys_mbox_t;
Compare that to the utility macro that RTX defines to help setup one of
these mailboxes with queue:
#define osMessageQDef(name, queue_sz, type)
uint32_t os_messageQ_q_##name[4+(queue_sz)];
osMessageQDef_t os_messageQ_def_##name =
{ (queue_sz), (os_messageQ_q_##name) }
Note the 4+(queue_sz) used in the definition of the message queue
array. What a hack! The RTX OS requires an extra 16 bytes to contain
its OS_MCB header and this is how it adds it in. Obviously the
sys_mbox_t structure used in the lwIP OS targetting code doesn't have
this. Without it, the RTX mailbox routines end up scribbling on
memory following the structure in memory. Adding 4 in that structure
fixes the memory allocation failure that I was seeing and now the network
stack can handle between 7 and 10 datagrams within a burst.