-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recover file policies after manager soft-restarts (from Nomad API errors) #733
Conversation
Fixes "failed to read policy in time" errors, at least some of which occur due to Handlers getting re-started, but then not receiving file policies, because they "already exist" from the point of view of the file policy Source. When the policy Manager creates and runs new Handlers, the new Handler needs to receive a policy on its h.ch, or h.handleTick() will have a nil currentPolicy, which results in "timeout: failed to read policy in time" So, each time Handler asks a Source to MonitorPolicy(), the Source needs to send a policy back to the Handler (via MonitorPolicyReq.ResultCh: h.ch), even if the Source already knows the policy from a previous run.
Co-authored-by: Erik Müller <erik.mueller@power.cloud>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
76fe71d
to
30ee9bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work getting to the bottom of this. Also reiterating the big thanks to @Seros for fixing it a while ago!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job with these test!
Fixes #519 "failed to read policy in time" errors, at least those which occur due to Handlers getting re-started, but then not receiving file policies, because they already exist from the point of view of the file policy Source.
When the policy Manager creates and runs new Handlers, the new Handler needs to receive a policy on its h.ch, or h.handleTick() will have a nil currentPolicy, which results in "timeout: failed to read policy in time"
So, each time Handler asks a Source to MonitorPolicy(), the Source needs to send a policy back to the Handler (via MonitorPolicyReq.ResultCh: h.ch), even if the Source already knows the policy from a previous run.
I want to acknowledge that, after I banged my head against it for awhile, I found that @Seros totally had a handle (heh) on it already in #625, but here I am anyway, having addressed the bug in a slightly different way.
The only real difference in behavior here is to retain the changed-or-not distinction when a reload signal is issued, which is probably trivial performance-wise, but I think nice to have in the logs.
Regardless, I've added @Seros as a co-author, since they also fixed it, months ago. 😋 Also feel free to verify this change, if you'd like, and let me know how it goes!