Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@DisallowConcurrentExecution jobs are running at same time on different nodes when cluster de-synced #176

Closed
pavelkokush opened this issue Aug 1, 2017 · 2 comments

Comments

@pavelkokush
Copy link

pavelkokush commented Aug 1, 2017

Use case: Quartz cluster with node A and B. All jobs are marked as @DisallowConcurrentExecution. Node A detect that node B "failed", because node B did not checkin for some period of time. But node B is ok (just failed to check-in for some reason) and there is one running job on node B.

Actual: node A will change status from STATE_BLOCKED to STATE_WAITING for all jobs that was started on node B. So jobs will start running on node A even if same jobs are still running on node B. It is NOT EXPECTED, because jobs are marked as @DisallowConcurrentExecution!!!

Expected: Probably add some config like "org.quartz.jobStore.pauseOnNodeFail: true". Then pause jobs those have status STATE_BLOCKED. So user that is monitoring cluster will verify cluster state manually and decide is ok to un-pause job, then un-pause it manually.

@pavelkokush
Copy link
Author

It is in JobStoreSupport.clusterRecover()

// free up stateful job's triggers
if (ftRec.isJobDisallowsConcurrentExecution()) {
    getDelegate()
            .updateTriggerStatesForJobFromOtherState(
                    conn, jKey,
                    STATE_WAITING, STATE_BLOCKED);
    getDelegate()
            .updateTriggerStatesForJobFromOtherState(
                    conn, jKey,
                    STATE_PAUSED, STATE_PAUSED_BLOCKED);
}

pavelkokush pushed a commit to pavelkokush/quartz that referenced this issue Aug 7, 2017
pavelkokush pushed a commit to pavelkokush/quartz that referenced this issue Aug 8, 2017
pavelkokush pushed a commit to pavelkokush/quartz that referenced this issue Aug 8, 2017
@zemian
Copy link
Contributor

zemian commented Feb 13, 2019

Hi, sorry for the long delay in respond to this issue. Your issue seems to be duplication of #107 ! Please try the solution provided over there. Check carefully about your cluster setting in scheduler config and your DB isolation level. If your problem is exists, try the acquireTriggersWithinLock=true config as discussed over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants