You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a potential livelock in the PVF validation host code.
In order to trigger it the following set of conditions need to take place:
there is a request for preparation of a certain PVF.
the preparation worker process dies.
approximately at the same time, the pool receives a message which then leads to calling of purge_dead clean up routine.
this leads to a race between purge_dead and I/O error originating from a read call on the UDS socket that connects the worker and the validation host. (NOTE that the race itself is not unforeseen and was an acceptable part of the design)
rip message is sent back to the queue
the queue will react by re-adding the message back into the execution queue optionally spawning an additional worker.
then when the worker is spawned or freed and picks up that job the cycle starts all over.
So in order to trigger it, those things that should take place:
the preparation worker dies.
the preparation pool receives a message in the narrow time window that triggers purge_dead
on top of that purge_dead wins the race to the read I/O error.
The first condition may be not easy to trigger, but it is possible. Either the node itself is under heavy load (esp. memory-wise) or the attacker crafted a PVF that can lead to panics in the preparation process.
The second and the third condition seem to be very unlikely. The preparation needs to be requested just in time between the exploited worker dies and but before the kernel notified the polkadot process that the pipe is closed and the async runtime picked up that change.
The text was updated successfully, but these errors were encountered:
There is a potential livelock in the PVF validation host code.
In order to trigger it the following set of conditions need to take place:
purge_dead
clean up routine.purge_dead
and I/O error originating from a read call on the UDS socket that connects the worker and the validation host. (NOTE that the race itself is not unforeseen and was an acceptable part of the design)So in order to trigger it, those things that should take place:
purge_dead
purge_dead
wins the race to the read I/O error.The first condition may be not easy to trigger, but it is possible. Either the node itself is under heavy load (esp. memory-wise) or the attacker crafted a PVF that can lead to panics in the preparation process.
The second and the third condition seem to be very unlikely. The preparation needs to be requested just in time between the exploited worker dies and but before the kernel notified the polkadot process that the pipe is closed and the async runtime picked up that change.
The text was updated successfully, but these errors were encountered: