-
Notifications
You must be signed in to change notification settings - Fork 3k
gethostbyname test FAIL with NUCLEO_F746ZG and IAR #5622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To be seen here #5597 (the last 2 ci test run) @kegilbert @studavekar I haven't seen it logged yet, was this known prior this report? |
@0xc0170 I had seen that error on the linked PR but thought it was due to the PR itself. I'll look into it on our end. |
I'm able to reproduce this locally as well on IAR. |
@kegilbert Thanks, I noticed that this is now real on master. @SeppoTakalo @mikaleppanen @kjbracey-arm Can you help with this one? |
Is this really specifically only the gethostbyname test failing? The failure appears to be an assert on return from the "connect" - getting a DHCP failure. So nothing to do with the gethostbyname bit. What appears to be distinct about gethostbyname compared to other tests is that it does its connect in the test setup stage, rather than in a test case. That's presumably a clue. Not familiar enough with the test frame work to interpret the clue though. |
Thanks Kevin. It's just that one test and only one toolchain, and seldomly appears in our test logs.
@jeromecoutant Can we look at this particular, if that would help? Has anyone been able to reproduce this? |
Test seems OK in morph test results #5576 |
The error seems to be fairly intermittent, I was unable to locally reproduce earlier this week but saw it crop up again in a CI run the next day. Trying to look more into it now. |
I was able to reproduce this locally, seems like test-case issue, I am preparing a fix for this. |
Made a PR: #5689 |
Okay, seems we've got an environment where we can get 100% failure with the default I guess the change in #5689 is that we are adding 1 pointer in RAM - the I think tomorrow's job is to JTAG debug the failing develop image to try to spot some corrupted memory stopping the connect from completing. I am reminded of #4686 - the best guess we currently have there is that someone is corrupting a static variable in the HAL's serial code. Different platform and compiler, but maybe could be caused by the same generic memory corruption bug? |
Now that #5576 is merged, maybe we can close this issue ? @0xc0170 @kegilbert , maybe check next morph test results before closing ? |
💯 Let us know if debugging this today leads to a fix @jeromecoutant I would keep this open until @kjbracey-arm confirms |
It appears it is a flaw in the STM32F Ethernet driver - I believe it's newly arising due to the Cortex-M7 STM chips. The driver writes to the Ethernet DMA descriptor, then writes to the Ethernet peripheral telling it to go. I assume the cache isn't active, but the store buffer is active. The write to the DMA descriptor hasn't cleared the store buffer by the time the Ethernet looks at it. So the Ethernet immediately goes back to sleep without transmitting. The Cortex-M7 doesn't automatically drain the store buffer when you access Device memory. Adding a DMB between the descriptor write and peripheral write solves the issue. So nothing to do with #4686, it seems. Not quite sure why it comes and goes between builds - my current guess is that it's down to alignment of the descriptors, which will in turn affect the order that the descriptor hits the RAM. |
@screamerbg Please forward this issue to ST. |
If needed, I have a .bin and .elf for reproducing the issue. |
If you can attach it, please do here. @ARMmbed/team-st-mcd |
Attached. |
Pending official update from STM, add memory barriers to the Ethernet HAL code for the STM32F7xx family. Cortex-M7 has a merging write buffer that is not automatically flushed by accesses to devices, so without these DMBs, we sometimes lose synch with the transmitter. The DMBs are architecturally needed in every version of this HAL, but adding just to the STM32F7 version for now to clear test, as the problem has only been observed on Cortex-M7-based devices. Fixes ARMmbed#5622.
Pending official update from STM, add memory barriers to the Ethernet HAL code for the STM32F7xx family. Cortex-M7 has a merging write buffer that is not automatically flushed by accesses to devices, so without these DMBs, we sometimes lose synch with the transmitter. The DMBs are architecturally needed in every version of this HAL, but adding just to the STM32F7 version for now to clear test, as the problem has only been observed on Cortex-M7-based devices. Fixes #5622.
Pending official update from STM, add memory barriers to the Ethernet HAL code for the STM32F7xx family. Cortex-M7 has a merging write buffer that is not automatically flushed by accesses to devices, so without these DMBs, we sometimes lose synch with the transmitter. The DMBs are architecturally needed in every version of this HAL, but adding just to the STM32F7 version for now to clear test, as the problem has only been observed on Cortex-M7-based devices. Fixes #5622.
Pending official update from STM, add memory barriers to the Ethernet HAL code for the STM32F7xx family. Cortex-M7 has a merging write buffer that is not automatically flushed by accesses to devices, so without these DMBs, we sometimes lose synch with the transmitter. The DMBs are architecturally needed in every version of this HAL, but adding just to the STM32F7 version for now to clear test, as the problem has only been observed on Cortex-M7-based devices. Fixes #5622.
Description
mbed test -m NUCLEO_F746ZG -t iar -vv -n tests-netsocket-gethostbyname
=> test is often FAIL
Target can't connect during DHCP procedure...
Test is OK with ARM and GCC
Target
NUCLEO_F746ZG
Toolchain
IAR
The text was updated successfully, but these errors were encountered: