-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cleaning up identities #1021
Fix cleaning up identities #1021
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over all this looks good. I understand there's some limitations to the testing setup but asserting a state we don't actually want seems odd to me.
// Only track local nodes to avoid overhead | ||
if track_identity { | ||
self.by_identity | ||
.entry(w.identity()) | ||
.or_default() | ||
.insert(w.uid.clone()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment describes the intended use for track_identity rather than what this code does. Should we consider adding this comment to the insert method itself and/or changing track_identity
to be is_node_local
perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should_track_certificates_for_removal()
probably belongs in workload
and should be checked as part of insert
, and not in cert_mgr
- it's being called before workload.insert
pretty much everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I did it this way is workload doesn't have access to the proxy mode, local node, etc at all, so it cannot make the call itself. We could plumb it down, but then its inconsistent with the should_prefetch_certificate
and requires even more plumbing around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I did it this way is workload doesn't have access to the proxy mode, local node, etc at all,
Figured.
We could plumb it down, but then its inconsistent with the should_prefetch_certificate and requires even more plumbing around
should_prefetch
and should_track
are almost the same check and it feels like both should probably go in workload
but yeah that's getting OOS for this.
// TODO: this should be vec![], but our testing setup doesn't exercise the real codepath | ||
check( | ||
vec![id1s.clone()], | ||
"removing final workload should clear things out", | ||
) | ||
.await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something we're looking to address before merging? Is it something covered elsewhere so we're not that worried?
Kind of odd to assert a final state that isn't actually the state it should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to before merging though it will take a lot of work I may not have time to in the timelines this would be good to merge in (1.22.0). I think asserting the final state in the meantime is better than no assertion so if/when we fix it, it is remembered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, odd but probably the least bad option I suppose
|
/cherrypick release-1.22 |
@howardjohn: new pull request created: #1027 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This fixes a bug where identities were being dropped wrongly. Basically we didnt track the identity properly, so ANY pod removal caused all ztunnels to forget that identity. This wasnt detected because ztunnel will on-demand fetch certificates its missing.