Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover file policies after manager soft-restarts (from Nomad API errors) #733

Merged
merged 6 commits into from
Oct 11, 2023

Conversation

gulducat
Copy link
Member

@gulducat gulducat commented Oct 9, 2023

Fixes #519 "failed to read policy in time" errors, at least those which occur due to Handlers getting re-started, but then not receiving file policies, because they already exist from the point of view of the file policy Source.

When the policy Manager creates and runs new Handlers, the new Handler needs to receive a policy on its h.ch, or h.handleTick() will have a nil currentPolicy, which results in "timeout: failed to read policy in time"

So, each time Handler asks a Source to MonitorPolicy(), the Source needs to send a policy back to the Handler (via MonitorPolicyReq.ResultCh: h.ch), even if the Source already knows the policy from a previous run.


I want to acknowledge that, after I banged my head against it for awhile, I found that @Seros totally had a handle (heh) on it already in #625, but here I am anyway, having addressed the bug in a slightly different way.

The only real difference in behavior here is to retain the changed-or-not distinction when a reload signal is issued, which is probably trivial performance-wise, but I think nice to have in the logs.

Regardless, I've added @Seros as a co-author, since they also fixed it, months ago. 😋 Also feel free to verify this change, if you'd like, and let me know how it goes!

gulducat and others added 3 commits October 6, 2023 15:41
Fixes "failed to read policy in time" errors, at
least some of which occur due to Handlers getting
re-started, but then not receiving file policies,
because they "already exist" from the point of view
of the file policy Source.

When the policy Manager creates and runs new Handlers,
the new Handler needs to receive a policy on its h.ch,
or h.handleTick() will have a nil currentPolicy, which
results in "timeout: failed to read policy in time"

So, each time Handler asks a Source to MonitorPolicy(),
the Source needs to send a policy back to the Handler
(via MonitorPolicyReq.ResultCh: h.ch), even if the
Source already knows the policy from a previous run.
Co-authored-by: Erik Müller <erik.mueller@power.cloud>
policy/file/source.go Outdated Show resolved Hide resolved
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

policy/file/source.go Outdated Show resolved Hide resolved
@gulducat gulducat force-pushed the gh-519-recover-file-policies branch from 76fe71d to 30ee9bb Compare October 10, 2023 16:49
Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work getting to the bottom of this. Also reiterating the big thanks to @Seros for fixing it a while ago!

policy/file/source_test.go Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job with these test!

@gulducat gulducat merged commit f7c06e6 into main Oct 11, 2023
@gulducat gulducat deleted the gh-519-recover-file-policies branch October 11, 2023 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to get policy (EOF)
3 participants