Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnrc/ipv6/nib: don't queue packets on 6lo neighbors and drop/flush if… #20834

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

derMihai
Copy link
Contributor

@derMihai derMihai commented Aug 26, 2024

Contribution description

This following fix only applies for CONFIG_GNRC_IPV6_NIB_QUEUE_PKT == 1 (default config).

This PR is an alternative solution for #20781:

There is currently the case that a on-link neighbor's packet queue might never be emptied. Specifically: if a neighbor is unreachable, the packets queued to be later send (i.e. once the neighbor is resolved) will never be de-queued if the neighbor is never resolved. On a typical link (e.g. ethernet), this is less of a problem, as the NIB subsystem actively tries to resolve the host, and in case of failure the neighbor entry is removed and the packets in it's queue get released. However, on e.g. 6LoWPANs, there is no active discovery going on, so the code path for removing the neighbor never gets executed.

This PR adds following changes:

  • packets are not enqueued on 6lo neighbors, as this is not required by specification: https://www.rfc-editor.org/rfc/rfc6775.html#section-5.6
  • once a neighbor's status switches to UNREACHABLE, flush it's packet queue
  • for UNREACHABLE neighbors: drop packets instead of queuing
  • neighbor packet queue length is capped

Issues/PRs references

#20781

@github-actions github-actions bot added Area: network Area: Networking Area: sys Area: System labels Aug 26, 2024
gnrc_pktqueue_t *oldest = _nbr_pop_pkt(node);
assert(oldest != NULL);
gnrc_icmpv6_error_dst_unr_send(ICMPV6_ERROR_DST_UNR_ADDR, oldest->pkt);
gnrc_pktbuf_release_error(oldest->pkt, ENOBUFS);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should drop with ENOBUFS or silently.

@derMihai derMihai force-pushed the mir/nib/drop_for_unreachable_rebase branch from e4aeb78 to f840f00 Compare August 26, 2024 07:52
@benpicco benpicco requested a review from fabian18 August 26, 2024 09:10
@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Aug 26, 2024
@riot-ci
Copy link

riot-ci commented Aug 26, 2024

Murdock results

FAILED

3733209 fixup! dropped per-neighbor packet queue

Success Failures Total Runtime
58 0 9195 01m:32s

Artifacts

#if CONFIG_GNRC_IPV6_NIB_QUEUE_PKT
/**
* @brief queue capacity for the packets waiting for address resolution,
* per neighbor. SHOULD always be smaller than @ref CONFIG_GNRC_IPV6_NIB_NUMOF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOULD always be smaller than @ref CONFIG_GNRC_IPV6_NIB_NUMOF

Why should that be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if it's >= CONFIG_GNRC_IPV6_NIB_NUMOF, then even for a single neighbor, you can't even get to drop old packets from it's queue because you can't allocate a new one in the first place. With multiple neighbors, this can happen anyway, so this is why I went for a SHOULD instead of MUST.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah

#if IS_ACTIVE(CONFIG_GNRC_IPV6_NIB_QUEUE_PKT)
static gnrc_pktqueue_t _queue_pool[CONFIG_GNRC_IPV6_NIB_NUMOF];
#endif  /* CONFIG_GNRC_IPV6_NIB_QUEUE_PKT */

That is not just an array of queues, but rather every packet to be queued has to get a slot here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_CAP is the queue capacity per neighbor.
I would define this as 1 by default and change the upper code snipped to:

#if IS_ACTIVE(CONFIG_GNRC_IPV6_NIB_QUEUE_PKT)
static gnrc_pktqueue_t _queue_pool[CONFIG_GNRC_IPV6_NIB_NUMOF * CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_CAP];
#endif  /* CONFIG_GNRC_IPV6_NIB_QUEUE_PKT */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm but it would be bad if packets cannot be queued even if there are 15 available slots because only one slot per neighbor is allowed. The current definition nevertheless looks a bit strange to me.

What about a neighbor is not allowed to have more than halve of the slots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we still need space in pktbuf to actually send out the neighbor solicitation (and receive the response), but that isn't deepening on the number of queued packets but their size - or do I misunderstand something here?

I think this is a different problem. What I'm trying to solve is the scenario where a neighbor gets flooded with packets and we run out of free queue entries. In that case, the neighbor(s) queue(s) will be filled with stale packets because we can't even _alloc_queue_entry() in order to drop the oldest packets.

Then one neighbor could take all the slots and at some point _alloc_queue_entry() will always fail.

AFAIK we should always drop the oldest, but that can't be done if we can't allocate in the first place.

The current capping solution doesn't guarantee this scenario won't happen, just makes it less probable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the default number of free queue entries is CONFIG_GNRC_IPV6_NIB_NUMOF == 16, which is rather small.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could as well drop the oldest packet, as soon as allocation fails and try to allocate again.
It could be that 16 slots are held by one neighbor and another neighbor for which allocation fails does not have an oldest packet to drop, so allocation fails again because one neighbor holds all the slots.

With your capacity limit per neighbor you want to make this case less likely, but also do not prevent this case.
Yes the idea is good, but at the same time you accept that a packet is not queued even though a slot could be allocated.

If I got that right I am not sure if this is really beneficial as long as we don´t note an issue that packets cannot be queued because one neighbor is taking most of the queue slots.
Did you observe this case?

Copy link
Contributor Author

@derMihai derMihai Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I got that right I am not sure if this is really beneficial as long as we don´t note an issue that packets cannot be queued because one neighbor is taking most of the queue slots.
Did you observe this case?

No, I only had issues with one host.

What bothers me most isn't failed allocations, but stale packets in a neighbor's queue. This goes against the first part of be strict when sending and tolerant when receiving.

It could be that 16 slots are held by one neighbor and another neighbor for which allocation fails does not have an oldest packet to drop, so allocation fails again because one neighbor holds all the slots.

This is partially true. We have the static _nib_onl_entry_t _nodes[CONFIG_GNRC_IPV6_NIB_NUMOF]; . We could iterate through that and drop either from the first neighbor (faster) or the one with the largest queue (may be slower for large CONFIG_GNRC_IPV6_NIB_NUMOF). Anyway, I don't see iterating through that list as a performance problem. We already do so when searching for free packets, adding one more run isn't changing much.

Copy link
Contributor Author

@derMihai derMihai Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the per-neighbor queue cap. Packet allocation now never fails, it just pops a packet from the neighbor with the most packets in its queue. I made the queue entry count one more than the neighbor count. That way, there must always be a neighbor with at least two packets in it's queue, so we never leave it packet-less.

sys/net/gnrc/network_layer/ipv6/nib/nib.c Outdated Show resolved Hide resolved
* per neighbor. SHOULD always be smaller than @ref CONFIG_GNRC_IPV6_NIB_NUMOF
*/
#ifndef CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_CAP
#define CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_CAP (CONFIG_GNRC_IPV6_NIB_NUMOF > 16 ? 16 : 1)
Copy link
Contributor

@benpicco benpicco Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that drop is a bit strange. Up to 16 entries, each entry can queue only a single packet, but once there are 17 entries, we can queue 16 per entry?

How about

Suggested change
#define CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_CAP (CONFIG_GNRC_IPV6_NIB_NUMOF > 16 ? 16 : 1)
#define CONFIG_GNRC_IPV6_NIB_QUEUE_PKT_CAP DIV_ROUND_UP(CONFIG_GNRC_IPV6_NIB_NUMOF, 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's fine, I don't know how probable is resolution of multiple neighbors at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants