Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

policy: add EgressNetwork integration tests for Http Grpc routes #13342

Merged
merged 5 commits into from
Nov 20, 2024

Conversation

zaharidichev
Copy link
Member

This PR alters the policy-test crate so we can exercise the same set of tests for both EgressNetwork and Service parents.

Signed-off-by: Zahari Dichev zaharidichev@gmail.com

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
@@ -372,7 +372,7 @@ impl kubert::index::IndexNamespacedResource<linkerd_k8s_api::EgressNetwork> for
service_tls_routes: Default::default(),
service_tcp_routes: Default::default(),
resource_port_routes: Default::default(),
namespace: Arc::new(ns),
namespace: Arc::new(ns.clone()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: create the Arc first and then clone the Arc instead of cloning the string.

@@ -651,9 +660,16 @@ impl Index {
}
}

fn reinitialize_egress_watches(&mut self) {
fn reinitialize_egress_watches(&mut self, namespace: Option<Arc<String>>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to testing? or does it fix an issue where we reinitialize egress watches more than necessary? just wondering how this change relates to the rest of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an optimization I knew could be made. The reason why I included it in this PR is because it minimises the flakiness significantly. As these tests are namespace-local, having modifications to egress networks in one namespace force reinitialize the watches for other namespaces leads to flakiness in these

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think this optimization actually improves correctness in the real world, or just eliminates unnecessary work? Do we know what causes the test flakiness? Ideally, an optimization shouldn't affect whether tests pass or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes it so streams that have resolved to EgressNetwork parents that live in namespace A are restarted when a modification to an EgressNetwork in namespace B happens. This is an optimization that results in a more correct behavior. The test flakiness is caused as I have pointed out by observing the above mentioned scenario each test is scoped to a namespace bu modifications to egress networks in other namespace (other tests) cause stream restarts, etc which causes flakiness.

ns.reinitialize_egress_watches();
match namespace.as_ref() {
Some(namespace) => {
if ns.namespace == *namespace {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be simpler to compare namespace to global_egress_network_namespace here instead of splitting it into a separate function. i.e.

if namespace == self.global_egress_network_namespace || *namespace == ns.namespace {
  ns.reinitialize_egress_watches()
}


pub fn ip(&self) -> String {
match self {
Self::EgressNetwork(_) => "1.1.1.1".to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return an Option so that EgressNetwork can return a None instead of a dummy value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this just needs to be any IP inside the EgressNework's cidr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is it. Quite hacky but it works for now. If we bump into problems we can always make it more sophisticated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a comment explaining this, I think.

})
.await;
}

#[tokio::test(flavor = "current_thread")]
async fn fallback_switches_to_egress() {
async fn egress_net_with_no_http_routes() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was a bit surprising to see egress_net tests here when there's a dedicated egress_networks test file. but I understand why it is so: we need to replicated these tests for every route type (gateway HTTP, linkerd HTTP, and GRPC). essentially we want to test every combination of the matrix of resource x route. i.e. (service, egress_network) x (gateway HTTP, linkerd HTTP, GRPC), which means that all of our tests are replicated 6 times.

Ideally we should spend some time figuring out how to test these combinations in a way that doesn't require duplicating each test 6 times. But that's perhaps outside the score of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think we can try and abstract over the resource route type, etc but yes this is probably for a follow-up cleanup. I also do not like the duplication.

@@ -383,8 +383,7 @@ impl kubert::index::IndexNamespacedResource<linkerd_k8s_api::EgressNetwork> for
self.resource_info
.insert(egress_net_ref, egress_network_info);

self.reindex_resources();
self.reinitialize_egress_watches();
self.reinitialize_egress_watches(self.egress_net_target_ns(Arc::new(ns)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional the reindex_resources got removed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was a mistake. Nice catch.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
@@ -651,9 +652,11 @@ impl Index {
}
}

fn reinitialize_egress_watches(&mut self) {
fn reinitialize_egress_watches(&mut self, namespace: Arc<String>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can take a &str instead of an Arc here to avoid needing to clone at the callsite.


pub fn ip(&self) -> String {
match self {
Self::EgressNetwork(_) => "1.1.1.1".to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a comment explaining this, I think.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
@zaharidichev zaharidichev merged commit 4a424be into main Nov 20, 2024
44 checks passed
@zaharidichev zaharidichev deleted the zd/egress-network-api-tests branch November 20, 2024 17:14
zaharidichev added a commit that referenced this pull request Nov 21, 2024
This PR builds on #13342 to add TCP and TLS route tests for the outbound API.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants