Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Callbacks not triggering if job timeout while in transaction #48392

Closed
NiroDeveloper opened this issue Sep 14, 2023 · 3 comments · Fixed by #48961
Closed

Batch Callbacks not triggering if job timeout while in transaction #48392

NiroDeveloper opened this issue Sep 14, 2023 · 3 comments · Fixed by #48961

Comments

@NiroDeveloper
Copy link

Laravel Version

10.23.1

PHP Version

8.1

Database Driver & Version

MariaDB 10.6.14

Description

I currently found a bug at our production servers, which leads to the fact that our job pipeline stopped.
We dispatch a batch with many jobs, these jobs are running a database transaction.
If a job fails because of a timeout (while being stuck in a transaction), the failed_jobs counter of the batch will not be increased.

Pre-analysis

To detect a timeout a pcntl alarm is used to kill the process.

pcntl_signal(SIGALRM, function () use ($job, $options) {

This opens like a second thread which will mark the job as failed and increment the batch failed jobs counter.
The SQL query on the batch table will also run in a transaction.
return $this->connection->transaction(function () use ($batchId, $callback) {

For some reason unknown to me, these two transactions from different threads get in each other's way, causing the batch transaction to be rolled back.
(The mysql exporter shows a transaction savepoint in this moment)

Steps To Reproduce

  1. Create a Job which will timeout in a transaction
class TimeoutJob implements ShouldQueue
{

    use Batchable;
    use Dispatchable;
    use InteractsWithQueue;
    use Queueable;
    
    public int $tries = 1;
    public int $timeout = 5;

    public function handle(): void
    {
        DB::transaction(fn() => sleep(999));
    }

}
  1. Dispatch this job in a batch
Bus::batch([new TimeoutJob(), new TimeoutJob(), new TimeoutJob()])
  ->allowFailures()
  ->dispatch()
  1. See results
    image
    image
    image
    image
@cosmastech
Copy link
Contributor

Is this related to laravel/horizon#1310

@github-actions
Copy link

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

@sebapastore
Copy link

sebapastore commented Oct 16, 2023

I have been looking into this issue and found that the problem is that the transaction from the Job (tx1) is never commited neither rollback. So, when the transaction that updates the job_batches table (tx2) is launched, it will only be commited if tx1 is commited to.

The solution that I found is to rollback any pending database transactions when the worker reach timeout, before any other updates.

This code works. But I think there must be a "proper way" of getting the database connection in Worker class without using the DB Facade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants