-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zebra -K argument and extend e2e testing #231
base: main
Are you sure you want to change the base?
Conversation
d60431a
to
2a4af86
Compare
925a212
to
35ba89e
Compare
@fedepaol this is ready for review but probably we need to break into different commits/PR
Thanks |
Then please break it :-) |
sure just confirming this is what you prefer |
3104576
to
ddabf59
Compare
ddabf59
to
187a073
Compare
@@ -49,7 +49,7 @@ data: | |||
# Check /etc/pam.d/frr if you intend to use "vtysh"! | |||
# | |||
vtysh_enable=yes | |||
zebra_options=" -A 127.0.0.1 -s 90000000" | |||
zebra_options=" -A 127.0.0.1 -K 120 -s 90000000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the commit says also --graceful_restart argument
but I don't see it.
Also, what's special about 120 seconds? Can we document it (as a comment and in the commit description)?
Also also, would it make sense to have it as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -K
is the short version arg for the --graceful_restart
, I prefered the short, do you prefer to use the long or should improve the commit title?
About the 120seconds, the 120 matches the 120 seconds that graceful-timeout advertise to the other side (https://docs.frrouting.org/en/latest/bgp.html#clicmd-bgp-graceful-restart-restart-time-0-4095)
I do not think it should be parameter because we already hardcode the https://docs.frrouting.org/en/latest/bgp.html#clicmd-bgp-graceful-restart-restart-time-0-4095 and i do not see any benefit.
e2etests/go.mod
Outdated
@@ -6,21 +6,21 @@ toolchain go1.22.4 | |||
|
|||
replace ( | |||
github.com/metallb/frr-k8s => ../ | |||
go.universe.tf/e2etest => github.com/metallb/metallb/e2etest v0.0.0-20240715121012-af3c10d65f18 | |||
go.universe.tf/e2etest => github.com/karampok/metallb/e2etest v0.0.0-20240613071516-2d52178fbc89 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this #239 is merged, I would refactor and this PR should not have any changes in the go.mod.
I am drafting the PR until the dependency is merged
e2etests/go.mod
Outdated
k8s.io/api v0.31.0 | ||
k8s.io/apimachinery v0.31.0 | ||
k8s.io/client-go v0.31.0 | ||
k8s.io/api v0.31.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these dep updates relevant?
) | ||
}) | ||
}) | ||
|
||
// checkPrefixOnNodes checks that a prefix has at least one route on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this belong to some pkg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo this help function is tailored to the graceful restart scenario because it checks the routes on the host itself (all other test use the routes from the vtysh) and is relaxed on the next hop (because not all nexthops exist as decribed int he coment). So I would not transfer into a pkg.
No objection though if you want that (which pkg do you have in mind)?
e2etests/tests/graceful_restart.go
Outdated
}) | ||
|
||
ginkgo.Context("When restarting the frrk8s deamon pods", func() { | ||
ginkgo.DescribeTable("routes are maintained", func(ipFam ipfamily.Family, prefix, learnRoute string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we clarify that we are checking routes both sides? Would be a single test checking each side clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will extend the description.
Would be a single test checking each side clearer?
Not sure do you mean two tests? Given that test takes 2minutes I would not add another instance of it for time reasons. Also checking in two consistently
is not working because I must make sure that routes always exists.
187a073
to
5b9cfe8
Compare
5b9cfe8
to
1359275
Compare
https://docs.frrouting.org/en/latest/zebra.html#cmdoption-zebra-K With that argument `-K 120`, when zebra stops it does not clean the routes that has install on the host. When zebra starts up with, it will wait 120 seconds before removing (sweeping) any routes that it previously created but are no longer valid. The value 120 is used to match the default `bgp gracefule-restart restart-time` that we advertise to the peer which is also 120seconds. https://docs.frrouting.org/en/latest/bgp.html#clicmd-bgp-graceful-restart-restart-time-0-4095 This is needed to complete the BGP graceful restart feature when the local endpoints depend on local routes that are learned from the BGP peer. Downside is that if we simply remove FRR (delete deployment or change speaker label toleration) those routes will remain on the host until reboot. Signed-off-by: karampok <karampok@gmail.com>
Extend the BGP graceful restart test to verify that both advertised and learned routes are maintained during the restart of the frr-k8s pods. This test should passed as long as the zebra process in FRR has the -K argument. Signed-off-by: karampok <karampok@gmail.com>
1359275
to
d521b9e
Compare
/kind bug
What this PR does / why we need it:
If we use learning routes, when we restart the pod those routes are gone from the k8s node. There is an option for zebra to keep them. That is useful only if Graceful restart is being used.
Special notes for your reviewer:
Release note: