Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cleaning up identities #1021

Merged
merged 1 commit into from
May 6, 2024

Conversation

howardjohn
Copy link
Member

This fixes a bug where identities were being dropped wrongly. Basically we didnt track the identity properly, so ANY pod removal caused all ztunnels to forget that identity. This wasnt detected because ztunnel will on-demand fetch certificates its missing.

@howardjohn howardjohn requested a review from a team as a code owner May 3, 2024 23:48
@howardjohn howardjohn added the release-notes-none Indicates a PR that does not require release notes. label May 3, 2024
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 3, 2024
Copy link
Contributor

@ilrudie ilrudie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over all this looks good. I understand there's some limitations to the testing setup but asserting a state we don't actually want seems odd to me.

Comment on lines +617 to +623
// Only track local nodes to avoid overhead
if track_identity {
self.by_identity
.entry(w.identity())
.or_default()
.insert(w.uid.clone());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment describes the intended use for track_identity rather than what this code does. Should we consider adding this comment to the insert method itself and/or changing track_identity to be is_node_local perhaps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should_track_certificates_for_removal() probably belongs in workload and should be checked as part of insert, and not in cert_mgr - it's being called before workload.insert pretty much everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did it this way is workload doesn't have access to the proxy mode, local node, etc at all, so it cannot make the call itself. We could plumb it down, but then its inconsistent with the should_prefetch_certificate and requires even more plumbing around

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did it this way is workload doesn't have access to the proxy mode, local node, etc at all,

Figured.

We could plumb it down, but then its inconsistent with the should_prefetch_certificate and requires even more plumbing around

should_prefetch and should_track are almost the same check and it feels like both should probably go in workload but yeah that's getting OOS for this.

Comment on lines +780 to +785
// TODO: this should be vec![], but our testing setup doesn't exercise the real codepath
check(
vec![id1s.clone()],
"removing final workload should clear things out",
)
.await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we're looking to address before merging? Is it something covered elsewhere so we're not that worried?

Kind of odd to assert a final state that isn't actually the state it should be

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to before merging though it will take a lot of work I may not have time to in the timelines this would be good to merge in (1.22.0). I think asserting the final state in the meantime is better than no assertion so if/when we fix it, it is remembered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, odd but probably the least bad option I suppose

@bleggett
Copy link
Contributor

bleggett commented May 6, 2024

should_prefix and should_track are mostly the same check and both probably belong as workload properties, but will do for 1.22

@istio-testing istio-testing merged commit 3d7b6d0 into istio:master May 6, 2024
3 checks passed
@howardjohn
Copy link
Member Author

/cherrypick release-1.22

@istio-testing
Copy link
Contributor

@howardjohn: new pull request created: #1027

In response to this:

/cherrypick release-1.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes-none Indicates a PR that does not require release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants