PVF validation host livelock #909

pepyakin · 2021-11-16T12:01:37Z

There is a potential livelock in the PVF validation host code.

In order to trigger it the following set of conditions need to take place:

there is a request for preparation of a certain PVF.
the preparation worker process dies.
approximately at the same time, the pool receives a message which then leads to calling of purge_dead clean up routine.
this leads to a race between purge_dead and I/O error originating from a read call on the UDS socket that connects the worker and the validation host. (NOTE that the race itself is not unforeseen and was an acceptable part of the design)
rip message is sent back to the queue
the queue will react by re-adding the message back into the execution queue optionally spawning an additional worker.
then when the worker is spawned or freed and picks up that job the cycle starts all over.

So in order to trigger it, those things that should take place:

the preparation worker dies.
the preparation pool receives a message in the narrow time window that triggers purge_dead
on top of that purge_dead wins the race to the read I/O error.

The first condition may be not easy to trigger, but it is possible. Either the node itself is under heavy load (esp. memory-wise) or the attacker crafted a PVF that can lead to panics in the preparation process.

The second and the third condition seem to be very unlikely. The preparation needs to be requested just in time between the exploited worker dies and but before the kernel notified the polkadot process that the pipe is closed and the async runtime picked up that change.

The text was updated successfully, but these errors were encountered:

pepyakin added the I3-bug label Nov 16, 2021

pepyakin mentioned this issue Nov 16, 2021

prepare worker: Catch unexpected unwinds paritytech/polkadot#4304

Merged

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

the-right-joyce added I2-bug The node fails to follow expected behavior. and removed I3-bug labels Aug 25, 2023

the-right-joyce added this to parachains team board Oct 18, 2023

the-right-joyce moved this to Backlog in parachains team board Oct 18, 2023

helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024

Use current version on db creation (paritytech#909)

70cfe57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVF validation host livelock #909

PVF validation host livelock #909

pepyakin commented Nov 16, 2021

PVF validation host livelock #909

PVF validation host livelock #909

Comments

pepyakin commented Nov 16, 2021