node: fix task caller lifecycle in Triggers schedule #2605

canepat · 2024-12-16T00:33:29Z

The problem was hidden behind the lifecycle of the trigger task caller: its coroutine frame could be destroyed before being spawned.

battlmonstr · 2024-12-16T08:34:25Z

silkworm/node/stagedsync/stages/stage_triggers.cpp

-    return concurrency::spawn_task(ioc_, task_caller());
+    co_return co_await concurrency::spawn_task(ioc_, task_caller());


This is an interesting side effect. I wonder why this change makes any difference?

The call to schedule is already awaited here on the call site

An additional await here delays the destruction of some locals, right? Which ones? task_caller is not relevant after calling it. trigger is moved into the task, so not destroyed unless the spawned task is destroyed. The spawned task is moved inside co_spawn. I hope that co_spawn is not destroying the passed task prematurely, but maybe it does? Is it possible to verify with a unit test?

If it really does, maybe we modify spawn_task to await for co_spawn inside spawn_task so that this mistake is not possible in other places (we use spawn_task without awaiting in a bunch of places) and in the future code?

I'd like to fill the gap in my understanding here. I know it is not the high priority, so let's maybe create an issue about researching and generalizing this and make it an exercise for the future reader?

This could potentially be a reason of some sentry discovery crashes, because SerialNodeDb follows this pattern.

let's try to move the callback from the capture list to the lambda parameter list and see if it works without co_await (see links below)

Yep, we must move both this and the callback because all the capture list is affected (indeed, the issue was happening on this->current_tx_).
Moreover, it seems to be reproducible systematically in unit tests, so I'm going to add one test and fix by moving from capture list to the lambda parameters.

battlmonstr · 2024-12-19T11:24:54Z

@canepat
chriskohlhoff/asio#1491
https://devblogs.microsoft.com/oldnewthing/20211103-00/?p=105870
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rcoro-capture

node: fix task caller lifecycle in Triggers schedule

c1aedc5

canepat added the maintenance Some maintenance work (fix, refactor, rename, test...) label Dec 16, 2024

canepat requested a review from battlmonstr December 16, 2024 00:43

canepat marked this pull request as ready for review December 16, 2024 00:43

battlmonstr approved these changes Dec 16, 2024

View reviewed changes

canepat added 2 commits December 21, 2024 09:59

unit test and better fix

89738da

fix typo

ba761ad

canepat merged commit 2d84274 into master Dec 21, 2024
5 checks passed

canepat deleted the fix/triggers_stage_invalid_task_lifecycle branch December 21, 2024 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node: fix task caller lifecycle in Triggers schedule #2605

node: fix task caller lifecycle in Triggers schedule #2605

canepat commented Dec 16, 2024

battlmonstr Dec 16, 2024

battlmonstr Dec 16, 2024

battlmonstr Dec 19, 2024

canepat Dec 21, 2024

battlmonstr commented Dec 19, 2024

		return concurrency::spawn_task(ioc_, task_caller());
		co_return co_await concurrency::spawn_task(ioc_, task_caller());

node: fix task caller lifecycle in Triggers schedule #2605

node: fix task caller lifecycle in Triggers schedule #2605

Conversation

canepat commented Dec 16, 2024

battlmonstr Dec 16, 2024

Choose a reason for hiding this comment

battlmonstr Dec 16, 2024

Choose a reason for hiding this comment

battlmonstr Dec 19, 2024

Choose a reason for hiding this comment

canepat Dec 21, 2024

Choose a reason for hiding this comment

battlmonstr commented Dec 19, 2024