-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gnrc/network_layer/ipv6/nib: fix packet leak with unreachable neighbors #20781
Conversation
be0bfd0
to
32dad42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very sensible to me, maybe @miri64 wants to give this a look, otherwise this is good to merge IMHO (sans small comments)
LGTM, but only scanned the code and did not test anything. If you tested this @benpicco, feel free to ACK. |
0617544
to
c9d79dd
Compare
c9d79dd
to
9340100
Compare
if (node->pktqueue_len == CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_CAP) { | ||
gnrc_pktqueue_t *oldest = _nbr_pop_pkt(node); | ||
assert(oldest != NULL); | ||
gnrc_pktbuf_release_error(oldest->pkt, EHOSTUNREACH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, this always comes together with:
gnrc_icmpv6_error_dst_unr_send(ICMPV6_ERROR_DST_UNR_ADDR, oldest->pkt);
before in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, fixed!
_evtimer_add(nce, GNRC_IPV6_NIB_FLUSH_PCK_QUEUE, | ||
&nce->flush_queue_timeout, | ||
CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_LINGER_MS); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure but think about it:
If the neighbor becomes reachable again suddenly, should the timer to flush the queue be canceled.
I guess,
void _nbr_flush_pktqueue(_nib_onl_entry_t *node)
{
if (_is_reachable(node)) {
return;
}
[...]
is supposed to catch that case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. There is no common entry point for setting a neighbor reachable, so that we can cancel the the timer there. Also _is_reachable() == true
doesn't require the neighbor to be in REACHABLE state, but to be neither in INCOMPLETE nor UNREACHABLE. We probably don't want to discard the packets in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding memory safety, the timer is canceled before the neighbor entry goes out of scope anyway.
Informational: the UNREACHABLE state is an extension. Maybe it is worth reading it quickly (it is short) as it is related to the topic. |
What is the packet linger (5ms)? |
it's the timeout for packets waiting in the queue of an unreachable neighbor. |
Yes, this document is specific for the UNREACHABLE state. But I am dropping the packets also in in he INCOMPLETE state (i.e. I wonder whether this is a bug, and we need a way to get to UNREACHABLE from any state. Once UNREACHABLE is set, then flush all packets. We don't need the timer for the flushing event anymore in this case, as this implies a delay. |
9340100
to
7fcc59c
Compare
If we never do multicast address resolution in 6lo, couldn't --- a/sys/net/gnrc/network_layer/ipv6/nib/nib.c
+++ b/sys/net/gnrc/network_layer/ipv6/nib/nib.c
@@ -1374,9 +1374,12 @@ static bool _resolve_addr(const ipv6_addr_t *dst, gnrc_netif_t *netif,
}
bool reset = false;
- DEBUG("nib: resolve address %s by probing neighbors\n",
- ipv6_addr_to_str(addr_str, dst, sizeof(addr_str)));
if (entry == NULL) {
+ /* don't do multicast address resolution on 6lo */
+ if (gnrc_netif_is_6ln(netif)) {
+ return false;
+ }
+
entry = _nib_nc_add(dst, netif ? netif->pid : 0,
GNRC_IPV6_NIB_NC_INFO_NUD_STATE_INCOMPLETE);
if (entry == NULL) {
@@ -1404,10 +1407,9 @@ static bool _resolve_addr(const ipv6_addr_t *dst, gnrc_netif_t *netif,
return false;
}
- /* don't do multicast address resolution on 6lo */
- if (!gnrc_netif_is_6ln(netif)) {
- _probe_nbr(entry, reset);
- }
+ DEBUG("nib: resolve address %s by probing neighbors\n",
+ ipv6_addr_to_str(addr_str, dst, sizeof(addr_str)));
+ _probe_nbr(entry, reset);
return false;
} |
no, because you still want it resolved. i.e. if the neighbor is a router, it can be resolved with a router advertisement. See the |
I am not so much convinced about the necessary complexity of the new timeout to release pending packets. I would have just moved your RIOT/sys/net/gnrc/network_layer/ipv6/nib/_nib-arsm.c Lines 306 to 312 in 3735cc1
Each time the NUD state of a neighbor becomes UNREACHABLE as a result of probing, all pending packets are flushed. The exponential backoff is capped at 60s. So at latest after 60s all pending packets are deleted. |
which is exactly what I suggested above, because UREACHABLE already implies a delay. Thing is, the UNREACHABLE state is never reached on 6lo (see above). Also, 60s is IMO too long, you can run out of free packets way before that. |
but that's purely by accident, right? |
As stated in https://www.rfc-editor.org/rfc/rfc6775.html#section-3.3 and by observation, we do. At some point he host sends a router solicitation. During resolution, |
I thought the PR has no impact on 6lowpan, because packets are not queued there. 5.6 of RFC 6775 |
|
Interesting. They were queued the whole time, this is where this whole PR started from. Then I guess @benpicco 's suggestion is once again valid: just drop packets. For any other links, we might just drop all the packets in the queue once the neighbor's status switches to UNREACHABLE, and that should happen fast (3s IRC). The only thing that has to be made sure is that the UNREACHABLE status can always be reached. In 6lo that is not always the case (see above). |
I did some more digging, on non-6lo: if the STALE state is reached, the UNREACHABLE state will be eventually reached. DELAY and PROBE can be reached only through STALE. So the only question remaining is if the transition INCOMPLETE -> STALE is guaranteed to always happen, and this is where I'm getting overwhelmed by the complexity of the code-base paired with my lack of knowledge in IPv6. I see two solutions: Solution 1Assume INCOMPLETE -> STALE is guaranteed to happen, or maybe someone more experienced can confirm this is true. Then:
Solution 2Assume nothing and stick with the current solution, but also don't enqueue packets on 6lo. I'd rather go with solution 1, as It's simpler and cleaner. |
I think when you ping an address that does not exist, the NCE will be added as INCOMPLETE but it will timeout and the neighbor will be deleted once the maximum number of probes is reached. A neighbor entry starts as INCOMPLETE when we first try to talk to him. If he talks to us first, it is added as STALE until it becomes REACHABLE by a replying with a solicited NA, or UNREACHABLE if we want to send something to him but dont get any reachability confirmation. Maybe you found this image already on google: https://njetwork.wordpress.com/wp-content/uploads/2014/01/msi_ipv6-nd-state-machine1.png |
yes, that happens in |
Not sure if I had something similar in there, but back when I implemented the NIB, I drew some diagrams as well https://github.com/miri64/riot-ndp-model/blob/master/index.md |
(also note, that the 6Lo-ND state machine is slightly different. Since UNREACHABLE is an extension to ND (RFC 7048), and if I interpret RFC 6776, section 13 correctly, it might be more sensible to drop the UNREACHABLE state altogether for 6LNs, and just fall back to original RFC 4861 behavior, i.e., as per the state machine in RFC 4861, appendix |
Do you really observe a bad/strange situation like this or is this hypothetical? When you |
Something that RIOT does not do at the moment is searching a new default router, once the current default router becomes UNREACHABLE. At the moment for 6LN if the default router is probed with unicast NS and does no longer respond, it becomes UNREACHABLE and even multicast NS would be sent for 6LN. In theory there is a list of default routers. (RIOT supports 1 by default). This is out of scope for this PR. The focus here should just be to get rid of the leak. |
Hypothetical, but I realize now that I missed the detail that in the case of a static route, there are two on-link entries: one bound to the off-link entry, and one in the nc. It is the former on which packets are enqueued. We are still left with the following two problems:
So then:
|
Solved with #20834. |
Contribution description
This following fix only applies for
CONFIG_GNRC_IPV6_NIB_QUEUE_PKT == 1
(default config).There is currently the case that a on-link neighbor's packet queue might never be emptied. Specifically: if a neighbor is unreachable, the packets queued to be later send (i.e. once the neighbor is resolved) will never be de-queued if the neighbor is never resolved. On a typical link (e.g. ethernet), this is less of a problem, as the NIB subsystem actively tries to resolve the host, and in case of failure the neighbor entry is removed and the packets in it's queue get released. However, on e.g. 6LoWPANs, there is no active discovery going on, so the code path for removing the neighbor never gets executed.
This PR adds a new event that flushes the packets in a neighbor's queue if and only if the neighbor is unreachable.
The flushing event is enqueued only when one of the following events happen:
Additionally, a neighbor's packet queue is now capped at
CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_CAP
. If this limit is reached before the flushing event is triggered, the oldest packets will get discarded. This is to ensure that packet-flooding a neighbor won't cause a DoS on others, should the packet pool get depleted before the flushing event.Testing procedure
Here is some example code that triggers the bug:
I briefly tested the changes with both
NETDEV_SAM0_ETH
andNETDEV_AT86RF215
. However, please keep in mind that this is my first time digging in the RIOT network stack, so there might be some edge-cases that I might have missed.