-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor dequeue transaction handling #55
Conversation
This refactor encapsulates pending task dequeue operations within their own transactions, updating the task row state to prevent duplicate processing by other callers. By doing so, we ensure that state changes are immediately visible, accurately reflecting task ownership throughout its processing lifecycle. Additionally, tasks now include a configurable heartbeat interval and a record of the last heartbeat. Workers periodically update the task row to indicate ongoing liveness. Should a task’s heartbeat become stale, the dequeue method can select it for reassignment. It's important to note that a missed heartbeat alone does not definitively indicate task abandonment, as a worker might resume processing after a temporary delay. To guard against this type of partial failure, workers also acquire a transaction-level advisory lock on the task ID. As long as a worker's transaction remains active, this lock prevents other workers from processing the same task, ensuring exclusive ownership and consistent processing even across intermittent failures. A notable benefit of these changes is that task progress states are fully utilized and in-progress tasks are visible globally. Furthermore, transaction overhead is reduced as a dequeue's transaction is only held for the duration of obtaining an available task. That said, a second transaction is still maintained for the duration of execution so long-running tasks still benefit from decomposition into e.g. multiple job steps.
@kirillsalykin this addresses the fact that "in-progress" hasn't be used and increases visibility overall. I think this is also somewhat closer to how e.g. pg-boss approaches things. |
In cases where a lock can't be acquired, workers will return early and try again later. This is important because only one worker can execute a task and failure to acquire the lock invalidates the assumption that the task is still in an available state.
21fc7ef
to
01cb957
Compare
c10bb14
to
0426f70
Compare
I might be wrong (please correct me if so), but this change reduces consistency. UPDATED: sorry for the noice, clear now! PS I would read description before making comments |
update underway.task | ||
set updated_at = now(), | ||
last_heartbeat_at = now() | ||
where id = $1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this still confuses me, I'd expect id
to be the "real" id, eg it is unique
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unique per queue name.
In part this is a semantical quality that's "more correct" but there's also some practical reasons to do it this way. For example, this means the same task (i.e. per ID) can be in multiple queues but that for each queue only one unique copy can exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means the same task (i.e. per ID) can be in multiple queues
This confuses me even more, I would expect id = identity = lifecycle, so it makes it tricky to figure out which queue and how updates task
status...
but lets take this discussion out of the PR, look good to me, btw :)
this makes code (and approach) slightly more complicated, but I think not keeping tx opened for the duration of task is a good thing... |
Agree on all points--I wish it weren't more complex of course but I think it's worth it for the observability gained. I also trust that projects like pg-boss have had longer to mature and so there's probably good reasons for some of their more significant design decisions which we can also benefit from. |
This refactor encapsulates pending task dequeue operations within their own transactions, updating the task row state to prevent duplicate processing by other callers. By doing so, we ensure that state changes are immediately visible, accurately reflecting task ownership throughout its processing lifecycle.
Additionally, tasks now include a configurable heartbeat interval and a record of the last heartbeat. Workers periodically update the task row to indicate ongoing liveness. Should a task’s heartbeat become stale, the dequeue method can select it for reassignment.
It's important to note that a missed heartbeat alone does not definitively indicate task abandonment, as a worker might resume processing after a temporary delay. To guard against this type of partial failure, workers also acquire a transaction-level advisory lock on the task ID. As long as a worker's transaction remains active, this lock prevents other workers from processing the same task, ensuring exclusive ownership and consistent processing even across intermittent failures.
A notable benefit of these changes is that task progress states are fully utilized and in-progress tasks are visible globally. Furthermore, transaction overhead is reduced as a dequeue's transaction is only held for the duration of obtaining an available task. That said, a second transaction is still maintained for the duration of execution so long-running tasks still benefit from decomposition into e.g. multiple job steps.