-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[sc-allocator] Handle some errors gracefully #14042
Conversation
Generally yes, but nothing we should support with the current implementation. It can be done as part of this: #11883
Could be better, could not be better. Generally the first step should be to open a pr to box a lot of these big things in the xcm instructions. Long term we really need to take into account the allocation size as well, as this isn't the first time we hit this issue with big enums. It already has happened with the Even if decoding would fail with an error, you would still not be able to deliver the messages? Without having looked into your code, why don't you decode the messages one after another? |
This. Currently the XCM enum is way too big; we need to cut it down by strategically adding some |
Thank you for the answers and the suggestions !
Thanks ! I didn't know about the issue. I'll follow it.
Ok, makes sense. I'll look into it an start a discussion with the XCM team about this.
In our case, we ran into this while working on an integration test meant to check that big messages arrive at the destination. So we were crafting big message payloads on purpose. Probably this use case is not very relevant for real world scenarios. And generally this shouldn't happen. But the idea was to have this more as a failsafe mechanism. In case this happens, it felt like it would be safer to have such a mechanism, and get a decoding error instead of a wasm trap. We wouldn't be able to deliver the message, but at least the chain wouldn't get stuck. Would that make sense ? |
Should be a non-controversial pr and you can just open it. The faster we decrease the size, the better.
Aren't relayers delivering messages via normal transactions? If yes, they will not be added to a block and thus, nothing bad should happen. |
BTW, I would propose that you just deliver the messages as |
@bkchr @koute there are many ways to fix it on our side (not get in this situation in the first place) - we found this in a testing environment where most other checks and guardrails are off - so this PR is not a prerequisite or blocker for the bridges usecase. Even so, I think it is still useful, I see it as defense in depth. Some bug or bad design can panic the runtime and break the chain doing an operation (even if it does it out of stupidity) that could be otherwise be recoverable. I might be missing some context or nuance but from high-level it looks more robust for an alocator to throw an error that the caller can choose to gracefully handle or panic depending on their logic. |
As I had already commented:
The approach is something we can think about and can implement. However, we should not support this with the current allocator. I mean the general and more fine grained errors can be merged, but not returning However, FRAME is also not build in a way to bubble up these errors. When you for example read something from the storage that would be too big to decode and it would fail. FRAME would return you the default value for the storage entry. |
Ok, I see. Thanks for the explanation and the context! |
Ok, Thank you for the details ! Closing this in favor of #11883 then. |
High level
I was wondering if we could handle some allocator errors (for example OOM) gracefully or if there is a strong reason why we panic in all error case and mark the allocator as poisoned. I couldn't find any documentation on this, sorry if I missed it.
We could do this by simply returning a
NULL
pointer instead of an error when we encounter a non-critical problem.Context
The size of a structure in memory is different from its encoded size. It can happen at times that the difference between these 2 representations to be significant. Take for example the XCM Instruction structure which takes ~1400 bytes in memory but has some variants that can be encoded in just 1 byte (for example
Instruction::ClearOrigin
)In such cases we could store encoded structures in a relatively small space, compared to the real memory representation. If we end up in such a corner case (having for example a large encoded
Vec
ofInstruction<Call>
), when trying to decode it the execution will end up here, whereVec::push
will eventually try to allocate too much memory and the wasm VM will panic here.I was wondering if we could avoid panicking on such errors (OOM) in order to give the wasm application the possibility to handle such corner cases gracefully. In the scenario above, for example we could adjust the decoding logic by replacing
r.push()
with:and we would get a decoding error instead of a wasm trap, which could be better.
Related to: paritytech/parity-bridges-common#2015