-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify start of the checking timer #85
Conversation
ff70c69
to
854f715
Compare
Start the timer directly, just checking that it's only one per worker. There's no attempt to avoid overlap between separate workers, the jitter should smooth bursts.
fd4c14f
to
0fee3a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what this PR is removing, so I am approving this PR.
I would feel more comfortable if someone with more experience with the healthcheckers (@locao ?) could review it before merging.
@@ -166,7 +166,6 @@ qq{ | |||
}, | |||
} | |||
}) | |||
ngx.sleep(1) -- active healthchecks might take up to 1s to start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love seeing these gone from tests.
68569a1
to
db951a8
Compare
one timer per worker, but before doing anything, tries to acquire an expiration lock. if fails, try again later. if the "winning" worker ever fails to renew it, some other worker would get it.
db951a8
to
b39cce6
Compare
Readded the "expiring locks", but didn't add the nested timer; the single timer checks if it can grab the lock. if it can't, changes it's period to try again. |
@locao , please give this another look if you can. It ... also looks ok to me, but I missed the previous one already ^__^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
This is a squashed commit that realigns master branch with 3.0.0 release. In order to do so the master branch was reverted back to 1.3.0 release (commit: dc2a6b6) and then the 3.0.0 release branch was merged to it (up to commit: a2bec67). Below you can see all the details of the squashed commits. --------- * release 1.4.0 * fix(healthcheck) use single timer for all active checks (#62) * fix(healthcheck) use single timer for all active checks * tests(*) removed tests that are not needed * docs(*) docs for release 1.4.0 * chore(ci) use newer openresty and luarocks releases (#68) * fix(healthcheck) single worker actively checks the status (#67) * release 1.4.1 * fix(healthcheck) record `last_run` when healthcheck is scheduled (#72) Prevents a thundering herd issue whereby additional healthchecks are scheduled in the time in which it takes the healthcheck to complete. * tests(active-probes) interval is respected (#73) * fix(healthcheck) record `last_run` when healthcheck is scheduled Prevents a thundering herd issue whereby additional healthchecks are scheduled in the time in which it takes the healthcheck to complete. * tests(active-probes) interval is respected Co-authored-by: Brian Fox <brianhfox@gmail.com> * fix(healthcheck) remove event watcher when stopping hc (#74) Co-authored-by: Brian Fox <brianhfox@gmail.com> Co-authored-by: Brian Fox <brianhfox@gmail.com> * tests(*) avoid some flakiness (#75) * release 1.4.2 * chore(*) add GitHub Actions workflows (#82) * chore(*) add GitHub Actions workflows * fix(healthcheck) lint error * Simplify start of the checking timer (#85) * simplify start of the checking timer, ensuring only one worker actively sends healthchecks. one timer per worker, but before doing anything, tries to acquire an expiration lock. if fails, try again later. if the "winning" worker ever fails to renew it, some other worker would get it. * chore(rockspec) added rockspec for release 1.5.0-1 Also: - updated scm-1 rockspec - bumped openresty version in CI tests * feat(*) add header support for active checks * feat(active) support map headers * feat(healthcheck) delayed_clear function (#88) Added new function delayed_clear. This function marks all targets to be removed, but do not actually remove them. If before the delay parameter any of them is re-added, it is unmarked for removal. This function makes it possible to keep target state during config changes, where the targets might be removed and then re-added. * chore(readme) 1.5.0 release (#91) * chore(readme) 1.5.0 release * docs(*) release 1.5.0 Also added docs missing to delayed_clear() function. * fix(healthcheck) Use pair instead ipair for hcs weak table (#93) * release 1.5.1 (#95) * chore(readme) update badges (#98) * docs(readme) updated with 1.4.x changes * chore(workflows) updates for 1.6.0 release - added latest openresty to the CI matrix - added tests for when lua-resty-worker-events or lua-resty-events are used * feat(healthcheck) support setting the events module (#105) * feat(healthcheck) support setting the events module * fix(healthcheck) defaults to lua-resty-worker-events * tests(workflows) fixed manual deps install * fix(healthcheck) check empty opts * chore(workflows) use last luarocks * test(workflows) use pre-built deps, test with or 1.13-1.21 * chore(workflows) install lua-resty-events in ci * tests(workflows) debug * fixed tests and resty-events usage * init resty-events in init_worker * fix(tests) init events module (#107) * add init_worker in 03-get_target_status.t * fix 03-get_target_status.t * fix 03-get_target_status_with_sleeps.t * fix 04-report_success.t * fix 05/06 * fix 07/08 * fix 09 * change 10 * fix 11 * fix 12 * change 13 * fix 15 * partial fix 16 * change 17 * fix 18 * change 13 * fix 16 * style 05 * fix 01/02 * use string.buffer in OpenResty 1.21.4.1 (#109) * use string.buffer in OpenResty 1.21.4.1 * remove cjson require * fix(healthcheck) use the events module set in defaults * tests(with_resty-events) disabled tests that need more work * fix(healthcheck) avoid breaking when opts are nil * tests(with_resty-events) removed unnecessary test * tests(with_resty-events) increased sleeps Co-authored-by: Chrono <chrono_cpp@me.com> * release 1.6.0 (#110) * docs(readme) release 1.6.0 * fix(rockspec) typo * chore(rockspec) release 1.6.0 * docs(*) release 1.6.0 * chore(*) localize string.format (#111) * fix(healthcheck) support any lua-resty-events 0.1.x (#118) * chore(workflows) bump deps versions * chore(helathcheck) support any lua-resty-events 0.1.x * fix(healthchecker) port 2.x lock fixes to 1.5.x (#113) * fix(healthchecker) port 2.x lock fixes to 1.5.x * chore(healthcheck) remove unused vars * chore(healthcheck) fix indent level * fix(healthcheck) correct duplicate handling in add_target * fix(healthchecker) handle fetch_target_list failure in checker callback * chore(healthcheck) apply suggestions from #112 Co-authored-by: Vinicius Mignot <vinicius.mignot@gmail.com> * chore(healthcheck) increase verbosity for locked function failures (#114) * chore(healthcheck) increase verbosity for locked function failures * tests(healthcheck) add tests for run_locked() * fix(healthcheck) lower the cleanup check frequency the health-check timer also checks if targets must be removed. to safely remove targets, the targets list is locked. if this check runs on every health-check cycle and there are a large number of targets, a bazillion locks will be created. this change avoids that by lowering the frequency the cleanup list is checked. the side-effect is that targets marked for cleanup may exist for more time (2.5s) than expected, and some unexpected active checks could happen. * tests(clear) increase delay for delayed clear tests with less locks the wait for delayed clean is longer. * docs(readme) release 1.6.1 * chore(rockspecs) release 1.6.1 * release 1.6.1 * docs(readme) updated build badge * chore(ci) remove old openresty versions * feat(healthcheck) avoid duplication post in rebuild healthcheck scenario * release 1.6.2 * Added support for https_sni in healthcheck.lua (#49) * fix(mtls) use OpenResty's API for mtls (#99) * chore(ci): fix cache path (#136) ${{ env.* }} is not evaluated in `with` causing gha tries to cache `/`. * release 1.6.3 (#135) * release 3.0.0 (#142) * feat(ci/KAG-1800): add lint and sast workflows using shared actions * chore(ci): pin shared code quality actions * chore(*): backport - localize some functions A commit on master 80ee2e1 introduced localizing some functions. This commit backports that one. Backports: #92 * fix(healthcheck): fixed incorrect default http_statuses when new() was called multiple times (#83) * chore(lint): bump kong/public-shared-actions * docs(README): added 1.5.2 and 1.5.3 releases * chore(*) rename readme, add release instructions * chore(healthcheck): fix get_defaults function * fix(test): fix worker-events test * release 3.0.0 * chore(github): cancel in progress workflows when new pushed --------- Co-authored-by: saisatish karra <saisatish.karra@konghq.com> Co-authored-by: Shuoqing Ding <dsq704136@gmail.com> Co-authored-by: Vinicius Mignot <vinicius.mignot@gmail.com> Co-authored-by: Thijs Schreijer <thijs@thijsschreijer.nl> * chore(*): revert commits back to 1.3.0 This reverts the master branch backs to the commit of dc2a6b6 so that we can skip over 2.0.0 release. The 1.3.0 release is the first common commit between master branch and 1.6.x (also 3.0.x) branches. * chore(docs): fix semgrep https warnings * docs(readme): update shield badges Co-authored-by: Vinicius Mignot <vinicius.mignot@gmail.com> * chore(*): add 2.0.0 rockspecs and fix tests Release 2.0.x introduced some rockspecs with fixes. Reverting back to 1.3.0 and reapplying changes from 3.0.0 reversed those fixes. This commit reintroduces them. KAG-2704 --------- Co-authored-by: Vinicius Mignot <vinicius.mignot@gmail.com> Co-authored-by: Brian Fox <brianhfox@gmail.com> Co-authored-by: Murillo Paula <murillo@murillopaula.com> Co-authored-by: Javier <javier.guerra@konghq.com> Co-authored-by: Thijs Schreijer <thijs@thijsschreijer.nl> Co-authored-by: Mayo <i@shoujo.io> Co-authored-by: Tomasz Nowak <tomanowa@gmail.com> Co-authored-by: Chrono <chrono_cpp@me.com> Co-authored-by: Michael Martin <flrgh@protonmail.com> Co-authored-by: Jun Ouyang <ouyangjun1999@gmail.com> Co-authored-by: HansK-p <42314815+HansK-p@users.noreply.github.com> Co-authored-by: Qi <call_far@outlook.com> Co-authored-by: Wangchong Zhou <fffonion@gmail.com> Co-authored-by: Aapo Talvensaari <aapo.talvensaari@gmail.com> Co-authored-by: saisatish karra <saisatish.karra@konghq.com> Co-authored-by: Shuoqing Ding <dsq704136@gmail.com>
Start the timer directly, just checking that it's only one per worker. There's no attempt to avoid overlap between separate workers, the jitter should smooth bursts.
Fix: Kong/kong#7619
CT-23