Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers/w5100: Fixed netdev_driver_t::recv() API #10412

Merged
merged 2 commits into from
Feb 5, 2019
Merged

Conversation

maribu
Copy link
Member

@maribu maribu commented Nov 16, 2018

Contribution description

The current implementation of the netdev_driver_t::recv() API in the W5100 driver does not comply with documentation:

  • The drop feature of recv() is not implemented
  • In case of a too small buffer no error is returned and the frame is silently truncated. The API mandates returning -ENOBUFS

Testing procedure

It would be nice to add a test application for this, as many drivers are affected...

Issues/PRs references

#10410, #10413


Update 1: Added reference to #10413

@maribu
Copy link
Member Author

maribu commented Nov 16, 2018

Please note that I do not have the hardware to test. This PR is thus completely untested.

@maribu maribu requested a review from haukepetersen November 16, 2018 11:53
@maribu maribu added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: network Area: Networking Area: drivers Area: Device drivers labels Nov 16, 2018
@maribu
Copy link
Member Author

maribu commented Nov 21, 2018

I just updated the second commit to comply with #10413

@aabadie
Copy link
Contributor

aabadie commented Dec 20, 2018

This driver was initially written by @haukepetersen, so maybe he can help testing this PR or maybe @miri64 also has access to this hardware ?

drivers/w5100/w5100.c Show resolved Hide resolved
@jia200x jia200x added Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines labels Jan 10, 2019
@jia200x
Copy link
Member

jia200x commented Jan 10, 2019

In 079ea83 commit message:

the frame should not be droppend and
no data of the frame should be returned.

  1. s/droppend/dropped
  2. It should say "the frame should be dropped", as stated in the documentation

@jia200x jia200x added Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines and removed Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines labels Jan 10, 2019
@jia200x
Copy link
Member

jia200x commented Jan 10, 2019

I don't have hardware to test neither :/

@maribu
Copy link
Member Author

maribu commented Jan 10, 2019

@jia200x: Squashed and fixed the commit message as requested.

@jia200x
Copy link
Member

jia200x commented Jan 10, 2019

@miri64 or @jcarrano, could you give it a try with hardware at FU?

@maribu maribu added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Jan 18, 2019
Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with w5100 on arduino-due. On current master I was able to ping the link-local address of my laptop from the Arduino and to ping the link-local address of the Arduino from the laptop. However it doesn't work with this PR. When I ping the Arduino from my laptop, I see NS that try to do address resolution on the interface, but no NA in reply. When I ping the laptop from the Arduino I see NS and NAs, but no ICMPv6 echo requests (ping packets). So my guess is the device now drops all packets. I added a comment below with my suspicion why this is, but I am not 100% sure.

drivers/w5100/w5100.c Outdated Show resolved Hide resolved
@maribu maribu removed the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Jan 22, 2019
@maribu
Copy link
Member Author

maribu commented Jan 22, 2019

@miri64: Mind to check again? Can you confirm that one minute of ping6 -s 65535 <ADDR> does not lock up RIOT with this PR?

@miri64
Copy link
Member

miri64 commented Jan 22, 2019

I think something is broken still. While I sometimes am able to ping the laptop from the node I very quickly get:

2019-01-22 13:35:42,788 - INFO #  ping6 fe80::bbf2:ce3b:1a7:f6c8
2019-01-22 13:35:43,796 - INFO # ping timeout
2019-01-22 13:35:43,804 - INFO # dropping additional response packet (probably caused by duplicates)
2019-01-22 13:35:44,812 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,816 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,820 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,820 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,824 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,829 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,833 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,836 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,840 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,841 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,845 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,849 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,853 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,857 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,857 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,861 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,865 - INFO # gnrc_netif: possibly lost interrupt.
2019-01-22 13:35:44,869 - INFO # gnrc_netif: possibly lost interrupt.

And that doesn't stop. Mind you, that my setup seems to be a little flaky. Sometimes when I reset the board isn't able to establish the SPI connection and just crashes. :-/ With ping -s 65535 <ADDR> I unsurprisingly get a EMSGSIZE back from the ping application. sudo ping -f <addr> works until I get the possibly lost interrupt message.

@maribu
Copy link
Member Author

maribu commented Jan 22, 2019

OK, I was able to get an Arduino Ethernet Shield. I should be able to debug this PR

@maribu
Copy link
Member Author

maribu commented Jan 22, 2019

With the Arduino-Mega2560 this works for me as expected. (I was only able to use examples/gnrc_minimal due to the RAM limitations of the board.)

My test was:

$ ping6 -I eth0 -c 20 fe80::2223:23ff:fe23:2323; \
> ping6 -I eth0 -c 20 -s 65335 fe80::2223:23ff:fe23:2323; \
> ping6 -I eth0 -c 20 fe80::2223:23ff:fe23:2323

Output (without the 64 bytes from ... lines) with this PR:

PING fe80::2223:23ff:fe23:2323(fe80::2223:23ff:fe23:2323) from :: eth0: 56 data bytes
--- fe80::2223:23ff:fe23:2323 ping statistics ---
20 packets transmitted, 20 received, 0% packet loss, time 52ms
rtt min/avg/max/mdev = 23.804/25.178/43.095/4.113 ms

PING fe80::2223:23ff:fe23:2323(fe80::2223:23ff:fe23:2323) from :: eth0: 65335 data bytes
--- fe80::2223:23ff:fe23:2323 ping statistics ---
20 packets transmitted, 0 received, 100% packet loss, time 264ms

PING fe80::2223:23ff:fe23:2323(fe80::2223:23ff:fe23:2323) from :: eth0: 56 data bytes
--- fe80::2223:23ff:fe23:2323 ping statistics ---
20 packets transmitted, 20 received, 0% packet loss, time 56ms
rtt min/avg/max/mdev = 23.811/24.980/43.101/4.161 ms

Output with current master (again eliding 64 bytes from ...):

PING fe80::2223:23ff:fe23:2323(fe80::2223:23ff:fe23:2323) from :: eth0: 56 data bytes
--- fe80::2223:23ff:fe23:2323 ping statistics ---
20 packets transmitted, 20 received, 0% packet loss, time 50ms
rtt min/avg/max/mdev = 21.708/23.070/40.755/4.061 ms

PING fe80::2223:23ff:fe23:2323(fe80::2223:23ff:fe23:2323) from :: eth0: 65335 data bytes
--- fe80::2223:23ff:fe23:2323 ping statistics ---
20 packets transmitted, 0 received, 100% packet loss, time 276ms

PING fe80::2223:23ff:fe23:2323(fe80::2223:23ff:fe23:2323) from :: eth0: 56 data bytes
--- fe80::2223:23ff:fe23:2323 ping statistics ---
20 packets transmitted, 0 received, 100% packet loss, time 276ms

@miri64
Copy link
Member

miri64 commented Jan 23, 2019

@miri64: Mind to test it with a reduced SPI clock, e.g. 1 MHZ instead of 5?

Doesn't help :(. I retested this PR on master. There it is pretty stable.

BTW when I run into the "possibly lost interrupt" problem and then hit the reset button on the board, initialization fails as reported above (and only then). I only am able to get the device back into a usable state by disconnecting it from USB and replugging.

@maribu
Copy link
Member Author

maribu commented Jan 23, 2019

OK, previous code did update rp (without using the result) in the drop case in a very sneaky way (see fixup above). As the result of this update was never used, moving the code into static inline void drop() cannot have introduced a bug. But I still dropped the sneaky rp += psize as argument in a function call, as this will only confuse the reader.

@miri64
Copy link
Member

miri64 commented Jan 24, 2019

Still having the gnrc_netif: possibly lost interrupt which I don't observe on master (there and in #10844 the device just hangs silently at some point).

@maribu
Copy link
Member Author

maribu commented Jan 29, 2019

@miri64: I bought an Arduino Due and I'm able to send 100 pings from Linux to the Arudino while simultaneously sending 100 pings from the Arduino to the Linux machine and got 98 response in each direction with this PR. I did so both using my "self-cooked" toolchain and BUILD_IN_DOCKER=1.

So I guess it works for me as expected, or did you test for longer periods of time?

If I'm indeed unable to reproduce your issue, I think chances are high that your board or shield is flanky. Or maybe there is some timing issue in the driver that by luck is not triggered by the code generated by the toolchains I used. Would you mind testing again with binaries generated via the docker build to rule out that the issue is triggered by your toolchain?

Btw: The W5100 draws on average 100mA according to this site and the Arudino Due is drawing up to 130mA. Some Notebooks do not comply with the USB standard when it comes to the current they provide. Have you tried to connect the Arudino on a Desktop PC or using a powered USB hub?

@miri64
Copy link
Member

miri64 commented Jan 29, 2019

If I'm indeed unable to reproduce your issue, I think chances are high that your board or shield is flanky. Or maybe there is some timing issue in the driver that by luck is not triggered by the code generated by the toolchains I used. Would you mind testing again with binaries generated via the docker build to rule out that the issue is triggered by your toolchain?

Might be. I'll try later to test with Docker again.

Btw: The W5100 draws on average 100mA according to this site and the Arudino Due is drawing up to 130mA. Some Notebooks do not comply with the USB standard when it comes to the current they provide. Have you tried to connect the Arudino on a Desktop PC or using a powered USB hub?

Up until now all tests were done via a USB hub ;-).

@miri64
Copy link
Member

miri64 commented Jan 29, 2019

Might be. I'll try later to test with Docker again.

Very likely now. I don't get the device to initialize anymore. :( I guess then I just have to trust your testing. At least it was not worse than before (since we now at least have output during the endless loop). Please squash.

@miri64 miri64 added Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines labels Jan 29, 2019
The netdev_driver_t::recv implementation of the w5100 does not provide the
drop feature. This commit adds it.

Fixes: RIOT-OS#10410
If netdev_driver_t::recv() is called and the provided buffer is smaller than
the frame then `-ENOBUFS` should be returned, the frame should be dropped, and
no data of the frame should be returned.

Addresses: RIOT-OS#10413
@maribu maribu added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Jan 29, 2019
Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK from my side

@maribu
Copy link
Member Author

maribu commented Jan 29, 2019

@jia200x: I think I addressed your remarks. Can you take a look and confirm?

@maribu maribu added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Feb 5, 2019
@maribu
Copy link
Member Author

maribu commented Feb 5, 2019

@jia200x: Your "changes requested" is currently blocking this PR. I think I addressed them. Can you have a look?

@jia200x
Copy link
Member

jia200x commented Feb 5, 2019

@jia200x: Your "changes requested" is currently blocking this PR. I think I addressed them. Can you have a look?

Oh sorry. Sure, I will

Copy link
Member

@jia200x jia200x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK on my side

@maribu
Copy link
Member Author

maribu commented Feb 5, 2019

@miri64 or @jia200x: Murdock ran again 4 hours ago and now everything is green. Maybe now is a good time to hit merge?

@miri64
Copy link
Member

miri64 commented Feb 5, 2019

Yepp.

@miri64 miri64 merged commit 82e293a into RIOT-OS:master Feb 5, 2019
@aabadie
Copy link
Contributor

aabadie commented Feb 8, 2019

It's a bug fix, should be backport in 2019.01 ?

@miri64
Copy link
Member

miri64 commented Feb 8, 2019

Yepp. @maribu can you provide one please?

@miri64 miri64 added the Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch label Feb 8, 2019
@maribu maribu deleted the w5100 branch February 8, 2019 12:45
@danpetry danpetry added this to the Release 2019.04 milestone Mar 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: drivers Area: Device drivers Area: network Area: Networking CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants