evpn: fix evpn losing type-2 routes #2804
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When fixing the EVPN MAC mobility complexity, the way destinations are indexed in the routing table changed from RD+ETAG+MAC+IP to only RD+MAC. This is incorrect per the BGP EVPN RFC. It works in most cases, as when an IP is present, virtually all EVPN implementations will announce two paths: with and without the IP. This way routes announces are balanced and pose no issues.
Issues arise when GoBGP is connected to multiple peers announcing the same things (read: route reflectors), at a high rate, with lots of routes (hundreds of thousands), and if multiple paths exist for the same mac (e.g. with and without an overlay IP address). The issue does not appear time if any of the four above conditions is false.
There, processing ends up racy and over time, some routes end up missing due to the concurrent updates. Such missing routes have been observed with a production setup with:
With this setup, we ended up with a handful of routes missing (usually 10 to 20) after a few days of runtime.
This commit reverts back the custom
tableKey
implementation done previously, to use the plainString
view of the prefix. It is to be noted this is suboptimal performance wise, but is correct.Fixes: c393f43 ("evpn: fix quadratic evpn mac-mobility handling")
Sorry for introducing this bug in the first place.