Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embiggen the goodness of maghemite #385

Merged
merged 15 commits into from
Oct 22, 2024
Merged

Embiggen the goodness of maghemite #385

merged 15 commits into from
Oct 22, 2024

Conversation

taspelund
Copy link
Contributor

@taspelund taspelund commented Oct 3, 2024

This addresses a few different items found in Maghemite.

  1. remove_prefix_path() did not delete the prefix key from rib_in after the last path was removed.
    Fixes: Static route key retained despite deleting all paths #369

  2. remove_prefix_path() was removing all paths for a prefix that happened to share a nexthop. This is bad because a single call to static_remove_v4_route() would delete all paths with that nexthop even when fields like rib_priority didn't match. This would likely also allow the deletion of a floating static route to inadvertently clobber a BGP route via the same next-hop.

  3. apply_update() and send_update() were incorrectly applying the relevant ImportExportPolicy to the list of withdrawn routes. Filtering route withdrawals causes stale routing information, as we can end up ignoring a legitimate withdrawal from a peer or we could forget to inform a peer that we're no longer originating a route (e.g. if we first set an export policy matching N routes in an announce-set, then removed a route from that announce-set). In either case, this is incorrect behavior that can lead to forwarding issues.
    Fixes: When Export Policy is configured, withdrawn NLRI can only ever contain routes allowed by Export Policy #330

@taspelund taspelund added Bug mgd Maghemite daemon labels Oct 3, 2024
@taspelund taspelund self-assigned this Oct 3, 2024
@taspelund
Copy link
Contributor Author

taspelund commented Oct 3, 2024

rib_in key behavior before:

➜  maghemite git:(f5dbb6a) ./target/debug/mgadm static get-v4-routes
{
    "1.1.1.1/32": [
        Path {
            bgp: None,
            local_pref: None,
            nexthop: 100.64.0.1,
            shutdown: false,
            vlan_id: None,
        },
    ],
}
➜  maghemite git:(f5dbb6a) ./target/debug/mgadm static remove-v4-routes 1.1.1.1/32 100.64.0.1
➜  maghemite git:(f5dbb6a) ./target/debug/mgadm static get-v4-routes
{
    "1.1.1.1/32": [],
}

rib_in key behavior after:

➜  maghemite git:(trey/cleanup_static_route_key) ./target/debug/mgadm static get-v4-routes
{
    "1.1.1.1/32": [
        Path {
            bgp: None,
            nexthop: 100.64.0.0,
            rib_priority: 2,
            shutdown: false,
            vlan_id: None,
        },
        Path {
            bgp: None,
            nexthop: 100.64.0.0,
            rib_priority: 10,
            shutdown: false,
            vlan_id: None,
        },
    ],
}
➜  maghemite git:(trey/cleanup_static_route_key) ✗ ./target/debug/mgadm static remove-v4-routes 1.1.1.1/32 100.64.0.0
➜  maghemite git:(trey/cleanup_static_route_key) ✗ ./target/debug/mgadm static get-v4-routes
{}

@taspelund taspelund marked this pull request as ready for review October 3, 2024 22:25
@taspelund taspelund marked this pull request as draft October 4, 2024 07:50
@taspelund taspelund changed the title Cleanup key in rib_in when all paths are removed Embiggen the goodness of maghemite Oct 4, 2024
@taspelund taspelund added bgp Border Gateway Protocol static Static Routing labels Oct 4, 2024
@taspelund taspelund marked this pull request as ready for review October 4, 2024 08:19
Copy link

@zeeshanlakhani zeeshanlakhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me, but can we add a test demonstrating the example you showed for removing the prefix from the rib?

Fixes: #369

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Import/Export filters are meant to modify which advertised prefixes
are allowed. For Import, this is simply an allow-list that accepts a
subset of the advertised nlri in a received update. For Export, this
is an allow-list that accepts a subset of the locally originated nlri.
In neither case do you want to apply these filters to the list of
withdrawn nlri, as this can result in stale routes if a legitimate
withdrawal is not sent or received.

Fixes: #330

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Removes an unused function.
Guards an illumos-specific import.
Consolidates (bfd) nexthop enabled functions into one that takes a bool.
Moves RIB locking and looping of prefixes and PrefixChangeNotifications
into RIB removal helper functions to improve pcn batching and
consolidate locations for future RIB batch work.
Removes re-processing of BGP path attributes.
Removes re-looping over routes/paths in a few places.
Eliminates some return types when no callers handled Result.
Adds some TODOs.

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
@taspelund taspelund force-pushed the trey/cleanup_route_key branch 2 times, most recently from 334fbdb to eb07d9d Compare October 20, 2024 08:29
@taspelund
Copy link
Contributor Author

taspelund commented Oct 20, 2024

I added two more commits + rebased off main.

The second commit adds the requested tests, plus additional unit testing for rdb::Db methods.

The first commit is a refactor of various rdb::Db methods and their callers. Highlights below:

  • Rename of various add/remove methods + helper functions, ideally helping to keep naming more consistent (e.g. using "peer" instead of "id", adding "bgp"/"static" to method names)
  • In the add/remove codepaths, there is now a single PrefixChangeNotification used to batch all pcn's together instead of firing them off individually. I tried to keep the updated routes together in a collection up until the point of processing them, so it will be more clear in the future where to make changes to support bulk prefix updates.
  • Delete codepath now accepts a closure that determines what paths are removed
  • removed an unused method get_imported() and consolidated the enable_nexthop()/disable_nexthop() methods into a single method accepting a bool: set_nexthop_shutdown().
  • Changed return types from Result -> void, since callers were not actually checking the Result anyway
  • loc_rib() now returns a Rib, which aligns with full_rib(), static_rib(), and bgp_rib()
  • Moved BGP attribute -> Path parsing out of a for loop, since these are per-update not per-route
  • Guarded an illumos-specific debug import
  • Added some TODOs into the comments, primarily for bgp: router id insufficient discriminator for route-refresh stale marking #241 and to check that we cover all cases for nexthop parsing
  • Introduced get_selected_prefix_paths() to get an entry from rib_loc

Other refactoring we might consider later:

  • - Moving test helpers into a common spot (e.g. wait_for_eq macros in bgp
  • - Moving logging helpers into a common spot (e.g. all of bgp/src/log.rs)
  • - Cleaning up remaining instances of .lock().unwrap() by moving to lock!()
  • - More ergonomic conversions between messages::Prefix and rdb::Prefix types/subtypes (e.g. Prefix4)

edit: Addressed the logging helpers and updates to use lock!() in new commits

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
mgd/src/bgp_admin.rs was re-defining lock! identically to the one in
mg-common.  Remove this dupe definition and just import from mg-common.

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Replace all old instances of .lock().unwrap() with lock!()

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Actually run the same cargo clippy command as CI, so I can see
errors locally :/

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Copy link
Collaborator

@rcgoodfellow rcgoodfellow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Trey! Some great improvements here. Just a few comments.

rdb/src/db.rs Show resolved Hide resolved
bgp/src/session.rs Show resolved Hide resolved
rdb/src/db.rs Show resolved Hide resolved
rdb/src/db.rs Outdated Show resolved Hide resolved
rdb/src/db.rs Show resolved Hide resolved
rdb/src/db.rs Show resolved Hide resolved
rdb/src/db.rs Show resolved Hide resolved
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Move over to using transaction semantics for static route db updates.
Also change remove_static_routes to take a reference, aligning with
add_static_routes.

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
@taspelund taspelund merged commit 985a9fb into main Oct 22, 2024
12 checks passed
@taspelund taspelund deleted the trey/cleanup_route_key branch October 22, 2024 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bgp Border Gateway Protocol Bug mgd Maghemite daemon needs testing static Static Routing
Projects
None yet
3 participants