-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Avoid missing a slot on slow proposer #3819
Comments
Is this a transaction weight issue ? If an extrinsic take too much time to be executed then it shouldn't be put into the block, the weight mecanism should prevent block from being too big, I think the weight for this extrinsic is wrong. I'm not sure how to solve it though. cc @kianenigma |
sidenote:
maybe some code could be optimised as well so we can handle contract of this size, but also maybe this contract size is quite big and the issue should be solve by reducing it with features like dynamic linking (which are not there yet). |
From the looks of it, the block did take too long and the weight system did not catch it basically. But the transaction is actually one of the dynamic ones that we currently don't really deal with in the weight system. This can also be fixed by making the authorship code multi-threaded. I must try and first reproduce this. |
Thank you for your follow-up on this issue! I am now trying to optimize the contract business logic, reduce the contract size, and look forward to your updates. |
I believe we should mark the extrinsic as invalid if we timeout during it's execution and it's the first one in the block. @kianenigma I don't see how we could make authorship multi-threaded, you mean to somehow verify correctness separately? @jeremymj as a workaround you may try |
like a back up solution if weight doesn't catch the too long transaction ? |
Exactly, it allows us to prevent a situation where underpriced (underweighted) extrinsic can simply stall the network as we see in this issue. |
It looks like the granularity of the time check in the current code is one transaction, meaning that if one transaction takes too long, we still miss the slot. I roughly imagine a parent thread that just moves on at some point if the proposal work takes too long should fix this. The slot worker should not wait for the proposer to finish and if the proposer fails to return for any reason, it should not miss the slot and just submit a half-full or even an empty block. It looks like we have this and maybe the only problem is changing substrate/core/consensus/slots/src/lib.rs Line 186 in 426c26b
|
Ah, I see. Indeed it makes sense. I don't think it requires threads though. You can imagine the proposer returning a |
as for just reproduction, adding a
Seems very interesting. I am quite novice with futures but can give it a try. |
I don't understand this. What would be the last element? What kind of elements would the Proposer return? |
Currently it produces a |
I suppose
But, I have a question about this polling and futures: From my understanding and @tomusdrw's explanations it looks like the proposer's |
Well, we do have |
Ahh, that sounds nice, but we should probably always return a But I'm also not sure if all of this is so easy doable with Futures. Futures don't expect that internally you sleep and calling into the runtime is the same as sleep. We would need to build the blocks in a separate thread. So, this will require quite some amount of work. This would also allow us to remove this |
I think you meant blocking. As |
Yeah sleeping/blocking, not such a big difference :P @kianenigma the issue is still about something different, we just talk about avoid missing slots. Nevertheless there is probably a bug in contracts. Checking a contract should probably not take that much time. |
I assume the behaviour of the contract code is actually rather normal? the code is big and just takes too long to process? But yeah maybe they should be two separate issue. You can revert the title if you prefer to have it so. |
I would wait for @pepyakin to comment on this issue. |
@tomusdrw The problem is solved after adding the startup parameters, but there are additional optional values for this parameter. Where can I find out more about them? |
@jeremymj which startup parameters? |
I think @jeremymj is referring to Nonetheless, I would not close this issue. The core problem still holds: |
@bkchr I use the way @kianenigma recommended to start the node: |
@jeremymj This affects how we execute the block (either Wasm or Native (if available)):
Note that this is merely a workaround - if you have incompatible version of the client that can't run natively, it will fall back to wasm anyway. Imho we need to address:
|
@tomusdrw I see, thank you! |
Sorry for belated reply. I think the problem is prosaic: OTOH, while a 240k contract sounds to be big, it is not enourmous and I think it would be desirable that at least one contract of such size could fit into an empty block processing. Hence I'd like to see if and how we can speed up the contract code verification process and how much of an improvement #3869 would give us. I filled #3885 for this. |
For anyone wanting to tackle this from the perspective of non-blockinng future, I've just written a test for my own understanding here that simulates what goes wrong in the slot worker when the author decides to sleep. https://github.com/paritytech/substrate/compare/kiz-author-blocking?expand=1 |
Any updates on this that has not been recorded here? does the author still fail to return in time if one single extrinsic takes too much time? |
I think this should be fixed now. You can still miss a slot, but we will backoff if we can not build the block in time. This means the allowed block time increases with the number of missed slots. Given that, I will close this issue ;) |
I want to upload a contract file to the node, when the contract file is about 100kB, it can be successfully uploaded; but when I upload the contract at 240kB,the node can't generate a new block,the prompt message is as follows:
2019-10-15 17:50:56 Discarding proposal for slot 261855508; block production took too long
I compiled the substrate code version is commit
1ae7a901d9b2a7f4fefc7b5f5846c2d13ef6d8c7
by command
cargo build --release
and started the node by
./target/release/substrate --dev
,the test contract wasm file I am using is as follows: reproduce_wasm_file.zip
The text was updated successfully, but these errors were encountered: