-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node: fix task caller lifecycle in Triggers schedule #2605
Conversation
return concurrency::spawn_task(ioc_, task_caller()); | ||
co_return co_await concurrency::spawn_task(ioc_, task_caller()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting side effect. I wonder why this change makes any difference?
The call to schedule is already awaited here on the call site
An additional await here delays the destruction of some locals, right? Which ones? task_caller
is not relevant after calling it. trigger
is moved into the task, so not destroyed unless the spawned task is destroyed. The spawned task is moved inside co_spawn. I hope that co_spawn is not destroying the passed task prematurely, but maybe it does? Is it possible to verify with a unit test?
If it really does, maybe we modify spawn_task to await for co_spawn inside spawn_task so that this mistake is not possible in other places (we use spawn_task without awaiting in a bunch of places) and in the future code?
I'd like to fill the gap in my understanding here. I know it is not the high priority, so let's maybe create an issue about researching and generalizing this and make it an exercise for the future reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could potentially be a reason of some sentry discovery crashes, because SerialNodeDb follows this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's try to move the callback from the capture list to the lambda parameter list and see if it works without co_await (see links below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we must move both this
and the callback because all the capture list is affected (indeed, the issue was happening on this->current_tx_
).
Moreover, it seems to be reproducible systematically in unit tests, so I'm going to add one test and fix by moving from capture list to the lambda parameters.
Fixes #2604
The problem was hidden behind the lifecycle of the trigger task caller: its coroutine frame could be destroyed before being spawned.