When concurrency is forbidding and dkron exiting before it's done - job will never run again #349

zacharya19 · 2018-04-20T13:45:44Z

Quick reproduce:

docker-compose up -d.
create job with this json:

{
    "name": "test_job_1",
    "command": "sleep 50",
    "schedule": "@every 10s",
    "concurrency": "forbid"
}

docker-compose restart dkron.

Now you can see dkron thinks the job is running, but it's not and forever we will see scheduler: Skipping execution.

EDIT: it's happening also when restarting dkron service (not only SIGKILL).

The text was updated successfully, but these errors were encountered:

koolay · 2018-04-27T03:27:08Z

I have the same problem.

This code has a concurrency scene bug, should be with a distributed lock.

func (j *Job) isRunnable() bool {
	status := j.Status()

	if status == Running {
		if j.Concurrency == ConcurrencyAllow {
			return true
		} else if j.Concurrency == ConcurrencyForbid {
			log.WithFields(logrus.Fields{
				"job":         j.Name,
				"concurrency": j.Concurrency,
				"job_status":  status,
			}).Debug("scheduler: Skipping execution")
			return false
		}
	}

	return true
}

zacharya19 · 2018-04-27T19:30:21Z

I believe each execution should save the process id so dkron would be able to check the status.

As #349 describes, a job with forbidden concurrency doesn't execute again if the target node is restarted. This PR tries to sove it by implementing a mechanism that asks the running nodes about the job status before checking if the job finished and before running the job.

vcastellm · 2018-05-03T22:15:00Z

Fixed in #359

zacharya19 · 2018-05-15T06:57:47Z

@Victorcoder I think the issue still exists.
I added GetStatus to the log and I see this:
concurrency=forbid get_status_function=failed job=test_job_1 job_status=running node=39d4b2198b72

I think it's because the check for:
if j.Status == StatusNotSet {

vcastellm · 2018-05-23T21:06:14Z

Should be fixed in #383

vcastellm added this to the v0.10.0 milestone Apr 25, 2018

vcastellm added bug and removed bug labels Apr 25, 2018

vcastellm removed this from the v0.10.0 milestone Apr 25, 2018

vcastellm added the bug label Apr 30, 2018

vcastellm added this to the v0.10.0 milestone Apr 30, 2018

vcastellm mentioned this issue May 3, 2018

Try to fix execution done missing #359

Merged

vcastellm closed this as completed May 3, 2018

vcastellm reopened this May 15, 2018

vcastellm modified the milestones: v0.10.0, v0.10.2 May 22, 2018

vcastellm mentioned this issue May 23, 2018

Concurrency 'forbid' not working #377

Closed

vcastellm closed this as completed May 23, 2018

prem2204 mentioned this issue Aug 11, 2018

Concurrency = forbid not working for me! docker-compose up --scale dkron=2 #415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When concurrency is forbidding and dkron exiting before it's done - job will never run again #349

When concurrency is forbidding and dkron exiting before it's done - job will never run again #349

zacharya19 commented Apr 20, 2018 •

edited

Loading

koolay commented Apr 27, 2018

zacharya19 commented Apr 27, 2018

vcastellm commented May 3, 2018

zacharya19 commented May 15, 2018

vcastellm commented May 23, 2018

When concurrency is forbidding and dkron exiting before it's done - job will never run again #349

When concurrency is forbidding and dkron exiting before it's done - job will never run again #349

Comments

zacharya19 commented Apr 20, 2018 • edited Loading

koolay commented Apr 27, 2018

zacharya19 commented Apr 27, 2018

vcastellm commented May 3, 2018

zacharya19 commented May 15, 2018

vcastellm commented May 23, 2018

zacharya19 commented Apr 20, 2018 •

edited

Loading