-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Heartbeat] Fix scheduler job type limit algorithm #27559
Conversation
Pinging @elastic/uptime (Team:Uptime) |
heartbeat/scheduler/scheduler.go
Outdated
jobSem.Release(1) | ||
// There is always at least 1 task (the current one), if that's all, then we know | ||
// there are no other jobs active or pending, and we can release the jobLimitSem | ||
if sj.jobLimitSem != nil && sj.activeTasks.Load() == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core of the fix, being able to count the actual number of active tasks
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and looks much cleaner than before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a test locally, turned out to be not working as expected. Its not getting limited even after setting limit to 1
@vigneshshanmugam I believe I've addressed all PR feedback now |
I take that back, I didn't see your final comment, will try a local test. |
@vigneshshanmugam I see it working with the following config. I've pushed up a commit with WARN logging showing the locking. You can see that with a limit of one only one monitor can have acquired the lock
sample config used is: heartbeat.config.monitors:
path: ${path.config}/monitors.d/*.yml
reload.enabled: false
reload.period: 5s
heartbeat.jobs.http.limit: 1
heartbeat.monitors:
- type: http
id: my-monitor
name: My Monitor
urls: ["http://www.google.com"]
mode: all
schedule: '@every 10s'
- type: http
id: alt-monitor
mode: all
name: Alt Monitor
urls: ["http://www.elastic.co"]
schedule: '@every 10s'
setup.template.settings:
index.number_of_shards: 1
index.codec: best_compression
setup.kibana:
output.console: ~
processors:
- add_observer_metadata: |
@andrewvc Sorry for that, my bad I had my acquire print statements in a wrong place before the acquire which made me think it was acquiring way too sooner, But I did test now properly and can see it working fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Need to remove this file though - x-pack/heartbeat/out
@Mergifyio backport 7.x |
@Mergifyio backport 7.15 |
Previously heartbeat would break when running with mode: all since that would create multiple terminal jobs. These would all attempt to release from the limit semaphore, when only the last one should. This also refactors recursive job running into an OO type structure to make things more readable. (cherry picked from commit d561a55)
Command
|
Previously heartbeat would break when running with mode: all since that would create multiple terminal jobs. These would all attempt to release from the limit semaphore, when only the last one should. This also refactors recursive job running into an OO type structure to make things more readable. (cherry picked from commit d561a55)
Command
|
Previously heartbeat would break when running with mode: all since that would create multiple terminal jobs. These would all attempt to release from the limit semaphore, when only the last one should. This also refactors recursive job running into an OO type structure to make things more readable. (cherry picked from commit d561a55) Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>
Previously heartbeat would break when running with mode: all since that would create multiple terminal jobs. These would all attempt to release from the limit semaphore, when only the last one should. This also refactors recursive job running into an OO type structure to make things more readable. (cherry picked from commit d561a55) Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>
* master: Skip Flaky Tests (elastic#27590) Remove fargate from aws module config (elastic#27575) [Heartbeat] Fix scheduler job type limit algorithm (elastic#27559)
Previously heartbeat would break when running with mode: all since that would create multiple terminal jobs. These would all attempt to release from the limit semaphore, when only the last one should. This also refactors recursive job running into an OO type structure to make things more readable.
What does this PR do?
Previously heartbeat would break when running with
mode: all
since that would create multiple terminal jobs. These would all attempt to release from the limit semaphore, when only the last one should.This also refactors recursive job running into an OO type structure to make things more readable.
Why is it important?
Causes a panic:
With this config
Checklist
- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration files- [ ] I have added an entry inCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Use the config above