-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add job tracking #86
Add job tracking #86
Conversation
If you're feeling brave, feel free to test this out. This branch should, in theory, properly handle zero min idle runner pools, with a small max runner value, without loosing jobs. Once slots become available, new runners will be added to handle the remaining queued jobs. |
sounds good .. we'll check it out, when we find some time .. (maybe not brave enough :)) |
There is no rush at all. Whenever you get a chance. No issues popped up during testing for me, but your use case is more varied than mine. When you do get a chance to test this, ping back. |
first of all: thanks one more time for the PR! some thoughts:
|
Yup. Should be doable. I need to write proper tests for the whole pool manager.
Sadly, not well. There is currently no global lock. There is a plan to try out etcd as a store, which gives us some nice primitives to work with like watchers and global locks. But I have no ETA on that.
They will linker on github until they expire. Sadly, that can't be helped. See this unnecessarily long comment for more details: Coincidentally, github itself does the same thing. When there is an issue with the gh runners, jobs started while the issue manifested will never be picked up. My guess is that internally they use the same hooks to queue the jobs for their own runners. |
11d4c01
to
1c7a7c8
Compare
1c7a7c8
to
8afb401
Compare
thx .. will try this out today :) |
Curious how it works out compared to the old workflow. |
175319f
to
e6fd812
Compare
a4bd85d
to
5d7cf5b
Compare
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Removes completed jobs from the db * Skip ensure min idle runners for pools with min idle runners set to 0 Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Break the lock on a job if it's still queued and the runner that it triggered was assigned to another job. This may cause leftover runners to be created, but we scale those down in ~3 minutes. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* enable foreign key constraints on sqlite * on delete cascade for addresses and status messages * add debug server config option * fix rr allocation Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
For now, the aditional labels would only contain the job ID that triggered the creation of the runner. It does not make sense to add this label to the actual runner that registeres against github. We can simply use it internally by fetching it from the DB. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
b1f188a
to
6c06afb
Compare
This branch changes the way garm decides when to spawn a new runner, by implementing some rudimentary job tracking. The bullet points of what is happening are:
ensureIdleRunnersForOnePool()
for the pool to which the runner that has picked up the job, belongs to. Doing so will replace the idle runner if a minimum idle runner value is set to something other than zero, while obeying max runners.consumeQueuedJobs()
was added. This function will attempt to create a new runner in response to jobs that are inqueued
state. It will run before theconsolidate()
function. It's purpose is to attempt to consume allqueued
jobs by creating any needed runners until the queue is gone. There is a back-off period of 15 seconds before it will attempt to create a new runner. The consolidate loop runs every 5 seconds, so there will be at least ~10 seconds in which an idle runner has time to pick up the job. After that, it will add a new runner.Some notes:
We cannot control which runners pick up which jobs. We may have pools of runners defined on multiple hierarchy levels that are suitable to sun a particular job. For example, if the job requests a runner with
self-hosted
, all pools on all levels (repo, org and enterprise) will be able to handle that job.We may create a runner in pool
A
as a response to aqueued
job, but a runner from poolB
may pick it up. The runner we spawned may pick up a completely different job than the one that prompted us to create the runner. This does not present an issue as long as all runners were spawned to handle jobs with the same label set. Even if we do end up with some dangling runners, the scale-down logic we have will clean those up after a short period of being idle. While there is no way to guarantee there will be no runner churn, it should be minimal.For best results, make sure to use unique sets of labels to target runners in a particular pool. Something like this should work:
Or if you want to keep it breef, simply use:
If you want to target a particular pool, just add something to narrow down the possibilities just to that pool:
Fixes: #47
Related: #78 #73