Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BGP cumulative link-bandwidth #1131

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

dplore
Copy link
Member

@dplore dplore commented Jun 14, 2024

Change Scope

  • Extend configuration of BGP global and peer-groups to permit cumulative link bandwidth and transitive behavior as per draft-ietf-bess-ebgp-dmz.

  • Also add link-bandwidth parameters to BGP neighbor configuration and state. (Previously these were only defined at the global and peer-group levels)

  • This change only adds leafs and is backwards compatible.

  • Example flattened paths:

Prefix: /network-instances/network-instance/protocols/protocol/bgp/

bgp/global/afi-safis/afi-safi/link-bw/config/send-cumulative=true
bgp/global/afi-safis/afi-safi/link-bw/config/non-transitive-ebgp=true
bgp/global/afi-safis/afi-safi/link-bw/config/divide=false

Platform Implementations

On Cisco IOS XR, there is no special configuration needed other than link-bandwidth-ext-community/config/enabled . The OS implicitly sums up the link-bandwidth of ECMP bgp paths.

Arista EOS expects an 'aggregate' configuration option on top of enabling link-bandwidth to be added on a per neighbor basis. They further support 'divide' and 'equal' options.

JunOS expects configuration of a BGP policy statement to enable 'aggregate-bandwidth' and additional options for transitive/non-transitive and a 'divide-equal' option. There is also a configuration item, set per bgp neighbor or peer to allow link-bandwidth community to be sent via eBGP (ie: allow sending BGP link-bandwidth which was defined as non-transitive in draft-ietf-idr-link-bandwidth, but then updated by draft-ietf-bess-ebgp-dmz to be transitive.

Tree view

This common stanza of changes is added at each level of BGP configuration:
bgp/global
bgp/peer-group
bgp/neighbor
bgp/global/afi-safis
bgp/peer-group/afi-safis
bgp/neighbor/afi-safis

dloher$ diff -U 20 ~/master-tree.txt ~/bess-tree.txt
--- /Users/dloher/master-tree.txt       2024-08-16 18:03:37
+++ /Users/dloher/bess-tree.txt 2024-08-19 18:38:34
@@ -3733,40 +3733,49 @@
         |     |  |  |     |     +--rw link-bandwidth-ext-community
         |     |  |  |     |     |  +--rw config
         |     |  |  |     |     |  |  +--rw enabled?   boolean
         |     |  |  |     |     |  +--ro state
         |     |  |  |     |     |     +--ro enabled?   boolean
         |     |  |  |     |     +--rw config
         |     |  |  |     |     |  +--rw maximum-paths?   uint32
         |     |  |  |     |     +--ro state
         |     |  |  |     |        +--ro maximum-paths?   uint32
         |     |  |  |     +--rw add-paths
         |     |  |  |     |  +--rw config
         |     |  |  |     |  |  +--rw receive?                  boolean
         |     |  |  |     |  |  +--rw send?                     boolean
         |     |  |  |     |  |  +--rw send-max?                 uint8
         |     |  |  |     |  |  +--rw eligible-prefix-policy?   -> /oc-rpol:routing-policy/policy-definitions/policy-definition/name
         |     |  |  |     |  +--ro state
         |     |  |  |     |     +--ro receive?                  boolean
         |     |  |  |     |     +--ro send?                     boolean
         |     |  |  |     |     +--ro send-max?                 uint8
         |     |  |  |     |     +--ro eligible-prefix-policy?   -> /oc-rpol:routing-policy/policy-definitions/policy-definition/name
+        |     |  |  |     +--rw link-bw
+        |     |  |  |     |  +--rw config
+        |     |  |  |     |  |  +--rw send-cumulative?       boolean
+        |     |  |  |     |  |  +--rw non-transitive-ebgp?   boolean
+        |     |  |  |     |  |  +--rw divide?                boolean
+        |     |  |  |     |  +--ro state
+        |     |  |  |     |     +--ro send-cumulative?       boolean
+        |     |  |  |     |     +--ro non-transitive-ebgp?   boolean
+        |     |  |  |     |     +--ro divide?                boolean
         |     |  |  |     +--rw ipv4-unicast
         |     |  |  |     |  +--rw prefix-limit
         |     |  |  |     |  |  +--rw config
         |     |  |  |     |  |  |  +--rw max-prefixes?            uint32
         |     |  |  |     |  |  |  +--rw prevent-teardown?        boolean
         |     |  |  |     |  |  |  +--rw warning-threshold-pct?   oc-types:percentage

@OpenConfigBot
Copy link

OpenConfigBot commented Jun 14, 2024

No major YANG version changes in commit 09dadb3

@rszarecki
Copy link
Contributor

The draft-ietf-bess-ebgp-dmz does 2 things:

  1. allow for propagation link-bandwidth ext-community attribute form BGP Local-RIB to eBGP session despite link-bandwidth ext-community is non-transitiv. This is orthogonal to BGP multipathing configuration on given system.
  2. it also alow to apply different aggregation algorithms to link-bandwidth in case when multiple path with link-bandwidth ext-community exist in Local-RIB (that is BGP multipath is enabled)
    Note: The original I-D draft-ietf-idr-link-bandwidth-07 page 3, last sentence explicitly allow initialization and transmission of link-bandwidth community on eBGP session (similiarly to other non-transitive BGP attribute like MED). It just do not allow propagation of link-bandwidth community for Local-RIB to eBGP session.

The global/afi-safis/afi-safi/use-multiple-paths/... hierarchy is wrong place to control if and how link-bandwidth is propagated send over eBGP because:

  • draft-bess do not requires multipathing to be enabled (see 1 above)
  • global/afi-safis/afi-safi/use-multiple-paths/... is meant to provide constrains to proces of polulation fo Local-RIB and FIB's NHG. The use-multiple-paths/ebgp/link-bandwidth-ext-community/config/enabled control if FIB ECMP weights should be derived form link-bandwidth values or not. In later case basic ECMP will be programmed in dataplane cons=umig less TCAM/SRAM resources, but link-bandwidth will remain in Local-RIB and will be propagated to iBGP peers (and eBGP under draft-bess if enabled).
  • provided that .../afi-safis/afi-safi/use-multiple-paths/config/enabled is set to "FALSE", and new leaf non-transitive is set to "TRUE"; what shall be system behaviour?
  • IMHO the knobs controlling draft-ietf-bess-ebgp-dmz shall be direct attributes of {neighbor|peer-group|global}/afi-safis/afi-safi attribute, similar to send-community-type (which is unfortunetly enum). My suggestion is:
    • define 2 new container dedicated for link-badwidth community (it is special case anyway) under bgp/global/afi-safis/afi-safi/config/' - tx-link-bandwidthwith leafs:enabled, ebgpdefault FALSE, cummulative, average/equalAND rx-link-bandwidthwith leafs:enabledand default TRUE. iftx-link-bandwidth/enabled` not specified: TRUE on iBGP and FALSE on eBGP.

release/models/bgp/openconfig-bgp-common.yang Outdated Show resolved Hide resolved
release/models/bgp/openconfig-bgp-common.yang Outdated Show resolved Hide resolved
release/models/bgp/openconfig-bgp-common.yang Outdated Show resolved Hide resolved
release/models/bgp/openconfig-bgp-common.yang Outdated Show resolved Hide resolved
release/models/bgp/openconfig-bgp-common.yang Outdated Show resolved Hide resolved
@jhaas-pfrc
Copy link

JunOS expects configuration of a BGP policy statement to enable 'aggregate-bandwidth' and additional options for transitive/non-transitive and a 'divide-equal' option. There is also a configuration item, set per bgp neighbor or peer to allow link-bandwidth community to be sent via eBGP (ie: allow sending BGP link-bandwidth which was defined as non-transitive in draft-ietf-idr-link-bandwidth, but then updated by draft-ietf-bess-ebgp-dmz to be transitive.

The details here are slightly off. Juniper's implementation of link-bw uses a transitive extended community code point. (Embarrassing, because Juniper is the one that published the base spec that uses non-transitive in the document.) This causes interop issues between Juniper and implementations that don't understand the non-draft-compliant transitive format.

The link-bw draft is pending an upcoming update in IETF that will address both the transitive and non-transitive cases. It is intended to contain interop procedures. Indirectly, it also addresses some points covered in the DMZ draft.

@rszarecki
Copy link
Contributor

I will reiterate it again:

The 'global/afi-safis/afi-safi/use-multiple-paths/...' hierarchy is wrong place to control if and how link-bandwidth is propagated send over eBGP because:

  • draft-bess do not requires multipathing to be enabled (see 1 above)
    global/afi-safis/afi-safi/use-multiple-paths/... is meant to provide constrains to proces of polulation fo Local-RIB and FIB's NHG. The use-multiple-paths/ebgp/link-bandwidth-ext-community/config/enabled control if FIB ECMP weights should be derived form link-bandwidth values or not. In later case basic ECMP will be programmed in dataplane cons=umig less TCAM/SRAM resources, but link-bandwidth will remain in Local-RIB and will be propagated to iBGP peers (and eBGP under draft-bess if enabled).
    provided that .../afi-safis/afi-safi/use-multiple-paths/config/enabled is set to "FALSE", and new leaf non-transitive is set to "TRUE"; what shall be system behaviour?
  • IMHO the knobs controlling draft-ietf-bess-ebgp-dmz shall be direct attributes of {neighbor|peer-group|global}/afi-safis/afi-safi attribute, similar to send-community-type (which is unfortunetly enum). My suggestion is:
    define 2 new container dedicated for link-badwidth community (it is special case anyway) under bgp/global/afi-safis/afi-safi/config/' - tx-link-bandwidthwith leafs:enabled, ebgpdefault FALSE, cummulative, average/equalAND rx-link-bandwidthwith leafs:enabledand default TRUE. iftx-link-bandwidth/enabled` not specified: TRUE on iBGP and FALSE on eBGP.

@dplore
Copy link
Member Author

dplore commented Aug 20, 2024

The 'global/afi-safis/afi-safi/use-multiple-paths/...' hierarchy is wrong place to control if and how link-bandwidth is
propagated send over eBGP because:

Too fast on the comments! I agree and have moved it. Pushing commit here now. :)

Prefix: /network-instances/network-instance/protocols/protocol/bgp/

bgp/global/afi-safis/afi-safi/link-bw/config/send-cumulative=true
bgp/global/afi-safis/afi-safi/link-bw/config/non-transitive-ebgp=true
bgp/global/afi-safis/afi-safi/link-bw/config/divide=false

@dplore
Copy link
Member Author

dplore commented Aug 20, 2024

Reviewed by OC operators on Aug 20, 2024 without objection.

@dplore
Copy link
Member Author

dplore commented Aug 22, 2024

@rszarecki and @jhaas-pfrc thank you for the earlier review, this is ready for your review again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

6 participants