-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
policy: add EgressNetwork integration tests for Http Grpc routes #13342
Conversation
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
@@ -372,7 +372,7 @@ impl kubert::index::IndexNamespacedResource<linkerd_k8s_api::EgressNetwork> for | |||
service_tls_routes: Default::default(), | |||
service_tcp_routes: Default::default(), | |||
resource_port_routes: Default::default(), | |||
namespace: Arc::new(ns), | |||
namespace: Arc::new(ns.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: create the Arc first and then clone the Arc instead of cloning the string.
@@ -651,9 +660,16 @@ impl Index { | |||
} | |||
} | |||
|
|||
fn reinitialize_egress_watches(&mut self) { | |||
fn reinitialize_egress_watches(&mut self, namespace: Option<Arc<String>>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to testing? or does it fix an issue where we reinitialize egress watches more than necessary? just wondering how this change relates to the rest of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an optimization I knew could be made. The reason why I included it in this PR is because it minimises the flakiness significantly. As these tests are namespace-local, having modifications to egress networks in one namespace force reinitialize the watches for other namespaces leads to flakiness in these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think this optimization actually improves correctness in the real world, or just eliminates unnecessary work? Do we know what causes the test flakiness? Ideally, an optimization shouldn't affect whether tests pass or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes it so streams that have resolved to EgressNetwork
parents that live in namespace A are restarted when a modification to an EgressNetwork
in namespace B happens. This is an optimization that results in a more correct behavior. The test flakiness is caused as I have pointed out by observing the above mentioned scenario each test is scoped to a namespace bu modifications to egress networks in other namespace (other tests) cause stream restarts, etc which causes flakiness.
ns.reinitialize_egress_watches(); | ||
match namespace.as_ref() { | ||
Some(namespace) => { | ||
if ns.namespace == *namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be simpler to compare namespace to global_egress_network_namespace here instead of splitting it into a separate function. i.e.
if namespace == self.global_egress_network_namespace || *namespace == ns.namespace {
ns.reinitialize_egress_watches()
}
|
||
pub fn ip(&self) -> String { | ||
match self { | ||
Self::EgressNetwork(_) => "1.1.1.1".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return an Option so that EgressNetwork can return a None instead of a dummy value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this just needs to be any IP inside the EgressNework's cidr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is it. Quite hacky but it works for now. If we bump into problems we can always make it more sophisticated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a comment explaining this, I think.
}) | ||
.await; | ||
} | ||
|
||
#[tokio::test(flavor = "current_thread")] | ||
async fn fallback_switches_to_egress() { | ||
async fn egress_net_with_no_http_routes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was a bit surprising to see egress_net tests here when there's a dedicated egress_networks test file. but I understand why it is so: we need to replicated these tests for every route type (gateway HTTP, linkerd HTTP, and GRPC). essentially we want to test every combination of the matrix of resource x route. i.e. (service, egress_network) x (gateway HTTP, linkerd HTTP, GRPC), which means that all of our tests are replicated 6 times.
Ideally we should spend some time figuring out how to test these combinations in a way that doesn't require duplicating each test 6 times. But that's perhaps outside the score of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we can try and abstract over the resource route type, etc but yes this is probably for a follow-up cleanup. I also do not like the duplication.
@@ -383,8 +383,7 @@ impl kubert::index::IndexNamespacedResource<linkerd_k8s_api::EgressNetwork> for | |||
self.resource_info | |||
.insert(egress_net_ref, egress_network_info); | |||
|
|||
self.reindex_resources(); | |||
self.reinitialize_egress_watches(); | |||
self.reinitialize_egress_watches(self.egress_net_target_ns(Arc::new(ns))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional the reindex_resources got removed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this was a mistake. Nice catch.
@@ -651,9 +652,11 @@ impl Index { | |||
} | |||
} | |||
|
|||
fn reinitialize_egress_watches(&mut self) { | |||
fn reinitialize_egress_watches(&mut self, namespace: Arc<String>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can take a &str instead of an Arc here to avoid needing to clone at the callsite.
|
||
pub fn ip(&self) -> String { | ||
match self { | ||
Self::EgressNetwork(_) => "1.1.1.1".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a comment explaining this, I think.
This PR builds on #13342 to add TCP and TLS route tests for the outbound API. Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This PR alters the policy-test crate so we can exercise the same set of tests for both
EgressNetwork
andService
parents.Signed-off-by: Zahari Dichev zaharidichev@gmail.com