-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: QEMU Ethernet drivers are flaky (seemingly after "net_buf" refactor) #13943
Comments
@rlubos, @rveerama1, @aurel32 FYI |
Patch submitted: #13945 With it:
|
Tried with echo-server, UDP works just fine but TCP refuses to connect. |
Thanks for testing, what BOARD it was? |
Just qemu_x86 |
Multiple flag bits were set so the ACK flag set was not checked properly which meant that connection establishment was not successfull. Fixes zephyrproject-rtos#13943 Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Multiple flag bits were set so the ACK flag set was not checked properly which meant that connection establishment was not successfull. Fixes #13943 Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
#13957 improved, but not fixed the situation. Reopening to track the situation over time. |
Master 5f2099f
I.e. of 2 tests, they abort due to ECONNRESET, which is worse then my yesterday test for #13957 (no implications, it's just visibly flaky and non-deterministic). But at least Zephyr didn't lock up.
After that, Zephyr is locked up. |
I finally did a new test sweep-over against revision 9722214. The results can be found in https://docs.google.com/spreadsheets/d/1_8CsACPEXqrMIbxBKxPAds091tNAwnwdWkMKr3994QY/edit#gid=0 as usual. Summary is:
|
also lowering a priority since there's only one driver that is not working |
Contacting eth_stellaris maintainer |
Going through the change history, this change bff65b6#diff-36181bf0c30748729a35e42e076ac604L72 seemed to have introduced a regression. The 2 byte data_len written first should not include the header size. |
As part of the ll_reserve refactoring effort, the packet length now includes header size as well. Before the refactor, when the packet length was written to the device, it did not include the header size, which is the required value as per the LM3S6965 datasheet. After the refactor the packet length includes the header size as well. The header size has to subtracted from the packet length before writing to the device. Fixes zephyrproject-rtos#13943. Signed-off-by: Vijay Kumar B <vijaykumar@zilogic.com>
As part of the ll_reserve refactoring effort, the packet length now includes header size as well. Before the refactor, when the packet length was written to the device, it did not include the header size, which is the required value as per the LM3S6965 datasheet. After the refactor the packet length includes the header size as well. The header size has to subtracted from the packet length before writing to the device. Fixes #13943. Signed-off-by: Vijay Kumar B <vijaykumar@zilogic.com>
Unfortunately, as of master 8e307a3, this issue isn't really fixed, running |
@pfalcon Is this witnessed only with the stellaris driver or with other drivers as well? |
@bravegnu: The current picture I have in my head re: 3 eth qemu drivers is:
But that's somewhat dated picture, I'd encourage to try all 3 drivers with current codebase and see. |
Retesting with 34b95fe (2.1.0-wip). All tests below with dumb_http_server sample. (Results posted in individual comment for easier review/reference). |
qemu_x86 + overlay-e1000.conf
Pings don't work after this. On Zephyr console side:
I.e. there were 296 errors like:
During this run. |
mps2_an385 and overlay-smsc911x.conf
Pings work afterwards. Nothing special on Zephyr console side. |
qemu_cortex_m3 and overlay-qemu_cortex_m3_eth.conf
On Zephyr console:
Surprisingly, this run pings work afterwards (usually it's completely deadlocked).
Look how perfect multiple of powers of 2 the time is! Let's restart with sample and start with pings (output abridged):
I.e., it starts ok, then 1024ms delays adds in. Then 2048ms, then, as we saw, 3072ms, etc. That's all is nothing new, we saw a similar behavior with frdm_k64f, perhaps imx1050, etc. It's unclear (at least to me) where this power-of-2 delay make come from. For explicit delays, we usually use power-of-10 delays, like 500ms, 1000ms. |
I tried to bisect the tree to check behavior of eth_stellaris before the net_buf refactor. But the situation is that it effectively was merged in the midst of it, and initially, not fully operational, as it needed updates for already merged refactoring patches. In other words, there's no in-tree bisect point, where the driver worked better than it does now, and in early days it was not buildable at all. Trying to fix build issues, I got runtime crashes instead. I also tried to compare driver source with eth_smsc911x, which works the best so far (for me), and also look into Stellaris datasheet, but I couldn't see anything obviously wrong. So, I put this aside for now again. (My idea was to get network tests to work on old qemu as shipped by distro packages, but upgrading to Zephyr's QEMU seems like better option given all the issues above.) |
Going through old issues. This is a good candidate for closing. Can be re-opened if really needed. |
Testing 3 qemu ethernet drivers we added in 1.14 timeframe with the idea to use them to test the IP stack better.
master: e731bdc
sample: samples/net/sockets/dumb_http_server
test command on host: ab -n1000 http://192.0.2.1:8080/
Not a single request succeeds:
On Zephyr console (after ~5
ab
attempts):Zephyr:
Build error:
The text was updated successfully, but these errors were encountered: