gnrc_netif: add packet to queue when device is busy #11263

miri64 · 2019-03-25T15:54:48Z

Contribution description

This fixes the issue showcased in #11256 (assuming the gnrc_netif_pktq module is compiled in and the network device supports to return -EBUSY) by providing a very simple MAC scheme: If the device is busy on send, queue the packet and don't send. When RX_COMPLETE or TX_COMPLETE is issued, try to send the first packet in the queue.

Testing procedure

Run make tests-gnrc_netif_pktq test for tests/unittests. For testing the actual integration a device adaptation is needed, which I will provide for at86rf2xx.

Since I edited the gnrc_netif.c: pinging with gnrc_networking should still work.

Issues/PRs references

Provides a fix for the race condition show-cased in #11256.

Basis for #11068

SY`) by providing a very simple MAC scheme: If the device is busy on send, queue the packet and don't send. When RX_COMPLETE or TX_COMPLETE is issued, try to send the first packet in the queue.

Testing procedure

Run make tests-gnrc_netif_pktq test for tests/unittests. For testing the actual integration a device adaptation is needed, which I will provide for at86rf2xx.

Since I edited the gnrc_netif.c: pinging with gnrc_networking should still work.

Issues/PRs references

Provides a fix for the race condition show-cased in #11256.

bergzand · 2019-03-25T16:09:55Z

Initial gut feeling for this API change is positive. I think it simplifies a few device drivers as those don't have to block anymore until the radio is done transmitting the previous frame. I know that at least the mrf24j40 has a clunky while(not_in_the_right_state) {} loop that might be refactored out with this change.

MAC scheme

Maybe this should be called a Device Access Control Scheme? 😄

miri64 · 2019-03-25T16:26:44Z

See #11264 for my at86rf2xx adaption.

sys/net/gnrc/netif/gnrc_netif.c

kaspar030 · 2019-03-26T10:04:01Z

Initial gut feeling for this API change is positive. I think it simplifies a few device drivers as those don't have to block anymore until the radio is done transmitting the previous frame. I know that at least the mrf24j40 has a clunky while(not_in_the_right_state) {} loop that might be refactored out with this change.

My feelings exactly. Now we can finally make the drivers just return "-EBUSY" when it currently cannot send.

Conceptionally this is a rather large change, so I suggest we don't rush it into the release.

miri64 · 2019-03-26T10:32:58Z

Conceptionally this is a rather large change, so I suggest we don't rush it into the release.

Yepp, fully agree with that. I just need it for my current work in #11068 but I believe #11068 itself isn't ready for this release as well ;-).

miri64 · 2019-03-26T17:16:27Z

Also I just noticed that it is not that easy netif->ops->send() currently is released no matter what so it is lost and can't be queued. This means we need to change all the link-layer glue-code and adapt the documentation (i.e. API change) to make this possible.

miri64 · 2019-03-27T18:25:26Z

Rebased to current master to resolve conflict.

miri64 · 2019-03-27T18:25:55Z

And set to WIP due to #11263 (comment)

miri64 · 2019-04-09T09:51:00Z

Made queue larger, since it was way to small for larger fragmented packets.

miri64 · 2019-07-12T12:20:51Z

Reworked a little bit, so a rewrite of all the gnrc_netif_* modules is not required for when gnrc_netif_pktq is included.

miri64 · 2019-08-30T13:02:25Z

Rebased to current master to resolve a conflict caused by the merging of #11837.

miri64 · 2019-09-25T09:42:06Z

Squashed to simplify some integration into a separate branch.

(cherry picked from commit 5cc265f, see RIOT-OS#11263)

... and also send on send error (i.e. when *medium* was busy) (cherry picked from commit 825b0f5, see RIOT-OS#11263)

(cherry picked from commit e29f3d5, see RIOT-OS#11263)

(cherry picked from commit 609add0, see RIOT-OS#11263)

jia200x · 2020-09-02T10:00:25Z

yes :)

miri64 · 2020-09-02T10:19:30Z

Should I squash then or are you still going through the changes?

jia200x · 2020-09-02T10:20:01Z

Should I squash then or are you still going through the changes?

Sure, please squash

miri64 · 2020-09-02T10:21:32Z

I just saw: also need to rebase ... again -.-

miri64 · 2020-09-02T10:29:37Z

Squashed

... and also send on send error (i.e. when *medium* was busy)

miri64 · 2020-09-02T10:31:32Z

And rebased and adapted to current master.

jia200x · 2020-09-03T14:07:19Z

I tested this one carefully using #14787 and notice something strange: if a radio returns -EBUSY instead of blocking, the RTT increases.

For instance, using this feature with at86rf2xx:

ping6 fe80::204:2519:1801:bd0e -i 170 -s 1024 -c 100
...

2020-09-03 16:04:53,168 # round-trip min/avg/max = 123.286/132.150/142.522 ms

If I modify the driver to return -EBUSY:

diff --git a/drivers/at86rf2xx/at86rf2xx_netdev.c b/drivers/at86rf2xx/at86rf2xx_netdev.c
index 3c83a43947..f8ff12d0da 100644
--- a/drivers/at86rf2xx/at86rf2xx_netdev.c
+++ b/drivers/at86rf2xx/at86rf2xx_netdev.c
@@ -114,6 +114,10 @@ static int _send(netdev_t *netdev, const iolist_t *iolist)
     at86rf2xx_t *dev = (at86rf2xx_t *)netdev;
     size_t len = 0;
 
+    if (at86rf2xx_get_status(dev) == AT86RF2XX_STATE_BUSY_TX_ARET) {
+        return -EBUSY;
+    }
+

I get

ping6 fe80::204:2519:1801:bd0e -i 170 -s 1024 -c 100
...

2020-09-03 15:59:34,245 # round-trip min/avg/max = 130.729/140.632/155.966 ms

I consistently get extra ~8-10 ms if -EBUSY is returned, for 11 fragments (1024 kb ping).

Any idea on why this could happen?

benpicco · 2020-09-04T16:41:23Z

With at86rf215 and

this patch

--- a/drivers/at86rf215/at86rf215.c
+++ b/drivers/at86rf215/at86rf215.c
@@ -250,6 +250,10 @@ static void _block_while_busy(at86rf215_t *dev)
 
 static void at86rf215_block_while_busy(at86rf215_t *dev)
 {
+    if (IS_ACTIVE(GNRC_NETIF_PKTQ)) {
+        return;
+    }
+
     if (_tx_ongoing(dev)) {
         DEBUG("[at86rf215] Block while TXing\n");
         _block_while_busy(dev);
@@ -262,7 +266,12 @@ int at86rf215_tx_prepare(at86rf215_t *dev)
         return -EAGAIN;
     }
 
-    at86rf215_block_while_busy(dev);
+    if (IS_ACTIVE(GNRC_NETIF_PKTQ) && _tx_ongoing(dev)) {
+        return -EBUSY;
+    } else {
+        at86rf215_block_while_busy(dev);
+    }
+
     dev->tx_frame_len = IEEE802154_FCS_LEN;
 
     return 0;

I get better results:

master

2020-09-04 18:39:16,406 #  ping6 2001:db8::204:2519:1801:c8c5 -s 512 -c 10
2020-09-04 18:39:16,533 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=0 ttl=64 rssi=-54 dBm time=113.710 ms
2020-09-04 18:39:17,525 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=1 ttl=64 rssi=-57 dBm time=109.456 ms
2020-09-04 18:39:18,534 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=2 ttl=64 rssi=-56 dBm time=120.383 ms
2020-09-04 18:39:19,540 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=3 ttl=64 rssi=-62 dBm time=118.382 ms
2020-09-04 18:39:20,533 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=4 ttl=64 rssi=-62 dBm time=110.219 ms
2020-09-04 18:39:22,533 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=6 ttl=64 rssi=-65 dBm time=112.776 ms
2020-09-04 18:39:23,539 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=7 ttl=64 rssi=-61 dBm time=113.988 ms
2020-09-04 18:39:24,548 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=8 ttl=64 rssi=-61 dBm time=123.613 ms
2020-09-04 18:39:26,401 # 
2020-09-04 18:39:26,420 # --- 2001:db8::204:2519:1801:c8c5 PING statistics ---
2020-09-04 18:39:26,425 # 10 packets transmitted, 8 packets received, 20% packet loss
2020-09-04 18:39:26,428 # round-trip min/avg/max = 109.456/115.315/123.613 ms

gnrc_netif_pktq

2020-09-04 18:38:11,040 #  ping6 2001:db8::204:2519:1801:c8c5 -s 512 -c 10
2020-09-04 18:38:11,133 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=0 ttl=64 rssi=-57 dBm time=77.002 ms
2020-09-04 18:38:12,128 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=1 ttl=64 rssi=-55 dBm time=73.160 ms
2020-09-04 18:38:13,121 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=2 ttl=64 rssi=-56 dBm time=75.721 ms
2020-09-04 18:38:14,129 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=3 ttl=64 rssi=-56 dBm time=74.772 ms
2020-09-04 18:38:15,120 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=4 ttl=64 rssi=-55 dBm time=73.810 ms
2020-09-04 18:38:16,127 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=5 ttl=64 rssi=-56 dBm time=72.549 ms
2020-09-04 18:38:17,134 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=6 ttl=64 rssi=-56 dBm time=77.309 ms
2020-09-04 18:38:18,130 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=7 ttl=64 rssi=-57 dBm time=83.665 ms
2020-09-04 18:38:19,133 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=8 ttl=64 rssi=-58 dBm time=75.729 ms
2020-09-04 18:38:20,128 # 520 bytes from 2001:db8::204:2519:1801:c8c5: icmp_seq=9 ttl=64 rssi=-59 dBm time=76.030 ms
2020-09-04 18:38:20,129 # 
2020-09-04 18:38:20,131 # --- 2001:db8::204:2519:1801:c8c5 PING statistics ---
2020-09-04 18:38:20,142 # 10 packets transmitted, 10 packets received, 0% packet loss
2020-09-04 18:38:20,147 # round-trip min/avg/max = 72.549/75.974/83.665 ms

benpicco

Code looks good and is working fine.
Let's finally get this in!

aabadie · 2020-09-04T17:11:22Z

@kaspar030 your concerns seem to be addressed. Can we dismiss your stale review ? (or you can just ACK :) )

miri64 · 2020-09-07T06:54:08Z

I tested this one carefully using #14787 and notice something strange: if a radio returns -EBUSY instead of blocking, the RTT increases.

Is it strange that when there is queuing involved that there might be an overhead?

miri64 · 2020-09-07T06:56:36Z

I consistently get extra ~8-10 ms if -EBUSY is returned, for 11 fragments (1024 kb ping).

If you are worried that it is such a high overhead, you have to remember to divide your round-trip time by 22 (11 fragments send to and from) and you get a more accurate picture.

jia200x · 2020-09-07T08:58:13Z

Is it strange that when there is queuing involved that there might be an overhead?

I think so. I would expect an overhead of memory usage, but not such an RTT overhead. Considering that the queue is there because network stack is much faster than the network device (including queue operations), I wouldn't expect such a time difference.

If you are worried that it is such a high overhead, you have to remember to divide your round-trip time by 22 (11 fragments send to and from) and you get a more accurate picture.

I'm not worried about the higher RTT considering that this fixes several problems, but I'm just wondering why there's such a difference.

jia200x · 2020-09-07T08:59:09Z

(I'm not blocking this PR, I'm just trying to figure out where does this time difference come from)

Comments have been addressed over a year ago

benpicco · 2024-11-13T10:32:00Z

sys/net/gnrc/netif/gnrc_netif.c

+        }
+        /* hold in case device was busy to not having to rewrite *all* the link
+         * layer implementations in case `gnrc_netif_pktq` is included */
+        gnrc_pktbuf_hold(pkt, 1);


What was meant by this comment? 😨

When this was written, the netif implementations just threw away the packet when they were unable to send (might not be the case with netdev_new, I don't know).

So, to prevent it not to be lost in that case, we hold the packet (i.e. the next release does not remove it from the packet buffer), to put it into the queue later in ll1431-1437. If the device was not busy, we just remove it in l1447.

netdev_new does indeed not release on send

miri64 added Area: network Area: Networking Type: new feature The issue requests / The PR implemements a new feature for RIOT labels Mar 25, 2019

miri64 requested review from PeterKietzmann and smlng March 25, 2019 15:54

miri64 mentioned this pull request Mar 25, 2019

at86rf2xx: do not try to send when device is busy #11264

Closed

kaspar030 previously requested changes Mar 26, 2019

View reviewed changes

sys/net/gnrc/netif/gnrc_netif.c Outdated Show resolved Hide resolved

kaspar030 reviewed Mar 26, 2019

View reviewed changes

sys/net/gnrc/netif/gnrc_netif.c Outdated Show resolved Hide resolved

miri64 mentioned this pull request Mar 26, 2019

gnrc_netif: fix DEBUG output on error #11285

Merged

miri64 force-pushed the gnrc_netif/new/pktq branch from 00ad20c to 3d01075 Compare March 27, 2019 18:23

miri64 added the State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet label Mar 27, 2019

miri64 mentioned this pull request Mar 28, 2019

gnrc_sixlowpan_frag: initial import of minimal forwarding #11068

Merged

miri64 force-pushed the gnrc_netif/new/pktq branch from 7467741 to a11bd99 Compare August 30, 2019 13:01

benpicco mentioned this pull request Sep 24, 2019

drivers/at86rf215: Add driver for the AT86RF215 dual-band IEEE 802.15.4-2015 radio #12128

Closed

10 tasks

miri64 force-pushed the gnrc_netif/new/pktq branch from a11bd99 to e29f3d5 Compare September 25, 2019 09:41

miri64 added a commit to 5G-I3/RIOT-public that referenced this pull request Sep 25, 2019

gnrc_netif: add a send queue

9eb0ea0

(cherry picked from commit 5cc265f, see RIOT-OS#11263)

miri64 added a commit to 5G-I3/RIOT-public that referenced this pull request Sep 25, 2019

gnrc_netif: add packet to queue when device is busy

476d498

... and also send on send error (i.e. when *medium* was busy) (cherry picked from commit 825b0f5, see RIOT-OS#11263)

miri64 added a commit to 5G-I3/RIOT-public that referenced this pull request Sep 25, 2019

tests: provide unittests for gnrc_netif_pktq

83b58ac

(cherry picked from commit e29f3d5, see RIOT-OS#11263)

miri64 added a commit to 5G-I3/RIOT-public that referenced this pull request Sep 25, 2019

gnrc_netif_pktq: add dequeue timer

8134f9d

(cherry picked from commit 609add0, see RIOT-OS#11263)

benpicco requested a review from kaspar030 September 25, 2019 16:36

miri64 force-pushed the gnrc_netif/new/pktq branch from 19fe623 to bb2f1c2 Compare September 2, 2020 10:28

miri64 added 3 commits September 2, 2020 12:30

gnrc_netif: add a send queue

7c7f667

gnrc_netif: add packet to queue when device is busy

a72d0ef

... and also send on send error (i.e. when *medium* was busy)

tests: provide unittests for gnrc_netif_pktq

818097f

miri64 force-pushed the gnrc_netif/new/pktq branch from bb2f1c2 to 818097f Compare September 2, 2020 10:31

jia200x mentioned this pull request Sep 4, 2020

ieee802154_submac: add initial support for common MAC sub layer #14950

Merged

benpicco added State: waiting for CI update State: The PR requires an Update to CI to be performed first and removed State: waiting for CI update State: The PR requires an Update to CI to be performed first labels Sep 4, 2020

benpicco approved these changes Sep 4, 2020

View reviewed changes

benpicco mentioned this pull request Sep 4, 2020

drivers/at86rf215: make use of packet queue if available #14954

Merged

benpicco merged commit a336fdc into RIOT-OS:master Sep 7, 2020

miri64 deleted the gnrc_netif/new/pktq branch September 8, 2020 10:43

benpicco reviewed Nov 13, 2024

View reviewed changes

benpicco mentioned this pull request Nov 13, 2024

gnrc_netif: fix packet leak with gnrc_netif_pktq & netdev_new_api #20983

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gnrc_netif: add packet to queue when device is busy #11263

gnrc_netif: add packet to queue when device is busy #11263

miri64 commented Mar 25, 2019 •

edited

Loading

bergzand commented Mar 25, 2019

miri64 commented Mar 25, 2019 •

edited

Loading

kaspar030 commented Mar 26, 2019

miri64 commented Mar 26, 2019

miri64 commented Mar 26, 2019

miri64 commented Mar 27, 2019

miri64 commented Mar 27, 2019

miri64 commented Apr 9, 2019

miri64 commented Jul 12, 2019

miri64 commented Aug 30, 2019

miri64 commented Sep 25, 2019

jia200x commented Sep 2, 2020

miri64 commented Sep 2, 2020

jia200x commented Sep 2, 2020

miri64 commented Sep 2, 2020

miri64 commented Sep 2, 2020

miri64 commented Sep 2, 2020

jia200x commented Sep 3, 2020

benpicco commented Sep 4, 2020 •

edited

Loading

benpicco left a comment

aabadie commented Sep 4, 2020

miri64 commented Sep 7, 2020

miri64 commented Sep 7, 2020

jia200x commented Sep 7, 2020

jia200x commented Sep 7, 2020

benpicco Nov 13, 2024 •

edited

Loading

miri64 Nov 13, 2024 •

edited

Loading

benpicco Nov 13, 2024

gnrc_netif: add packet to queue when device is busy #11263

gnrc_netif: add packet to queue when device is busy #11263

Conversation

miri64 commented Mar 25, 2019 • edited Loading

Contribution description

Testing procedure

Issues/PRs references

Testing procedure

Issues/PRs references

bergzand commented Mar 25, 2019

miri64 commented Mar 25, 2019 • edited Loading

kaspar030 commented Mar 26, 2019

miri64 commented Mar 26, 2019

miri64 commented Mar 26, 2019

miri64 commented Mar 27, 2019

miri64 commented Mar 27, 2019

miri64 commented Apr 9, 2019

miri64 commented Jul 12, 2019

miri64 commented Aug 30, 2019

miri64 commented Sep 25, 2019

jia200x commented Sep 2, 2020

miri64 commented Sep 2, 2020

jia200x commented Sep 2, 2020

miri64 commented Sep 2, 2020

miri64 commented Sep 2, 2020

miri64 commented Sep 2, 2020

jia200x commented Sep 3, 2020

benpicco commented Sep 4, 2020 • edited Loading

master

gnrc_netif_pktq

benpicco left a comment

Choose a reason for hiding this comment

aabadie commented Sep 4, 2020

miri64 commented Sep 7, 2020

miri64 commented Sep 7, 2020

jia200x commented Sep 7, 2020

jia200x commented Sep 7, 2020

benpicco Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

miri64 Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

benpicco Nov 13, 2024

Choose a reason for hiding this comment

miri64 commented Mar 25, 2019 •

edited

Loading

miri64 commented Mar 25, 2019 •

edited

Loading

benpicco commented Sep 4, 2020 •

edited

Loading

benpicco Nov 13, 2024 •

edited

Loading

miri64 Nov 13, 2024 •

edited

Loading