Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@DisallowConcurrentExecution jobs with multiple triggers running concurrently #674

Closed
shaundsmith opened this issue Mar 23, 2021 · 7 comments
Labels
stale Inactive items that will be automatically closed if not resurrected

Comments

@shaundsmith
Copy link

shaundsmith commented Mar 23, 2021

Multiple triggers for a single job detail execute concurrently for a job annotated with @DisallowConcurrentExecution when the triggers are scheduled for the same time and the scheduler is in a cluster.

Expected behaviour
The job will not execute concurrently across the cluster

Actual behaviour
The job will execute concurrently across the cluster

Scenario
We have a single Quartz job that has multiple triggers associated with it in a clustered environment. The job is non-concurrent and annotated with @DisallowConcurrentExecution. Additionally, there is only a single job detail entry associated with this job.
We have multiple Quartz triggers referencing this job detail entry in order to provide different parameters to the job based on user-defined values via the Trigger parameters.
Some of these triggers are scheduled for exactly the same time. In this scenario I'd expect that only a single trigger for the job would run at any given time across the cluster, but we're seeing each member of the cluster pick up a different trigger for the same non-concurrent job detail entry.

If there is a ~1 minute difference between the start times, everything works as expected - the cluster waits for the previous job to finish before attempting to start the second.

Analysis
I've investigated into the issue, and it appears that the cluster ignores the BLOCKED state of the trigger. I've added some log statements using a debuggers "Evaluation on Breakpoint" feature (StdJDBCDelegate line 1522):

Cluster Host 1

Triggering: [CustomerJob.729de3c4-43ee-4e5d-bde2-53b072813ff1]
2021-03-22T16:20:05.014055 - Updating triggers for job CustomerJob. new status=BLOCKED, old status=WAITING
Updated - 60 triggers
2021-03-22T16:20:05.033595 - Updating triggers for job CustomerJob. new status=BLOCKED, old status=ACQUIRED
Updated - 2 triggers
2021-03-22T16:20:05.047688 - Updating triggers for job CustomerJob. new status=PAUSED_BLOCKED, old status=PAUSED
Updated - 0 triggers
...(Job Running)
2021-03-22T16:20:05.557811 - Updating triggers for job CustomerJob. new status=WAITING, old status=BLOCKED
Updated - 62 triggers

Cluster Host 2

Triggering: [CustomerJob.fa1a8228-d93b-4204-816e-cf85ce555fa5]
2021-03-22T16:20:05.219289 - Updating triggers for job CustomerJob. new status=BLOCKED, old status=WAITING
Updated - 0 triggers
2021-03-22T16:20:05.233818 - Updating triggers for job CustomerJob. new status=BLOCKED, old status=ACQUIRED
Updated - 1 triggers
2021-03-22T16:20:05.245169 - Updating triggers for job CustomerJob. new status=PAUSED_BLOCKED, old status=PAUSED
Updated - 0 triggers
...(Job Running)
2021-03-22T16:20:05.781245 - Updating triggers for job CustomerJob. new status=WAITING, old status=BLOCKED
Updated - 0 triggers

Details
Quartz Properties:

org.quartz.scheduler.instanceId=AUTO
org.quartz.scheduler.interruptJobsOnShutdownWithWait=true

org.quartz.jobStore.isClustered=true
org.quartz.jobStore.acquireTriggersWithinLock=true # Tried both true and false
org.quartz.jobStore.txIsolationLevelSerializable=true # Tried both true and false

QRTZ_JOB_DETAILS table:

| SCHED_NAME      | JOB_NAME    | JOB_GROUP   | JOB_CLASS_NAME       | IS_DURABLE | IS_CONCURRENT | IS_UPDATE_DATA | REQUESTS_RECOVERY |
| quartzScheduler | CustomerJob | CustomerJob | *customer-job-class* | 1          | 1             | 0              | 0                 |

QRTZ_CRON_TRIGGERS table:

| SCHED_NAME      | TRIGGER_NAME | TRIGGER_GROUP | CRON_EXPRESSION | TIME_ZONE_ID     |
| quartzScheduler | *UUID*       | CustomerJob   | 0 55 11 * * ?   | America/New_York |
| quartzScheduler | *UUID*       | CustomerJob   | 0 55 11 * * ?   | America/New_York |
@stale
Copy link

stale bot commented Aug 2, 2021

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale Inactive items that will be automatically closed if not resurrected label Aug 2, 2021
@Azer249
Copy link

Azer249 commented Aug 5, 2021

We've been experiencing the same issue.

Is there any update on this issue now, given the project seems to be active again?

@stale stale bot removed the stale Inactive items that will be automatically closed if not resurrected label Aug 5, 2021
@asdfgh19
Copy link

still occurs, a single job detail annotated with @DisallowConcurrentExecution, three or more triggers referencing this job detail entry, and call triggerJob method at the same time, sometimes, this job just run once or twice, the remaining triggers BLOCKED forever in the table qrtz_triggers, no one set their state to WAITING or other state. It looks like there is something wrong with the state machine. quartz version 2.3.2.

@stale
Copy link

stale bot commented Nov 17, 2021

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale Inactive items that will be automatically closed if not resurrected label Nov 17, 2021
@stale stale bot closed this as completed Nov 24, 2021
@shaundsmith
Copy link
Author

shaundsmith commented Nov 24, 2021

This is still a problem

@diban
Copy link

diban commented Dec 16, 2021

Indeed, I can confirm this is still an issue. And from my investigations it's caused by this: 3f65b28.

@GaoForGot
Copy link

GaoForGot commented Feb 14, 2022

Encountered relevant problem, here is my scenario,
I set up one job(one jobdetail) with @DissallowConcurrentExecution annotated running on a cluster with two nodes . The job is triggered by cronTrigger, the cron expression is (30/30 * * * * ? *) . The cluster uses jdbcjobstore. acquireTriggersWithinLock is set with true.
The result I expected is that the job triggered --> running --> finished by two nodes without concurrent overlap, however, they overlap like this occasionally:
node_1 start: 2022-01-25 14:09:25 node_1 end: 2022-01-25 14:19:25
node_2 start 2022-01-25 14:10:00 node_2 end: 2022-01-25 14:10:18
node_1's firing was actually a misfire handling(fire once now policy by default), it seems like node_2 didn't check the status of the trigger, acquired and executed it.
The concurrent running of this job has caused lots of unexpected problem, It's been troubling me for a month.
Looking for some help, thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Inactive items that will be automatically closed if not resurrected
Projects
None yet
Development

No branches or pull requests

5 participants