Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

NotInFinalizedChain: trying to revert the same finalized block hash #14347

Open
2 tasks done
liuchengxu opened this issue Jun 12, 2023 · 7 comments
Open
2 tasks done

NotInFinalizedChain: trying to revert the same finalized block hash #14347

liuchengxu opened this issue Jun 12, 2023 · 7 comments

Comments

@liuchengxu
Copy link
Contributor

liuchengxu commented Jun 12, 2023

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

The error message indicates it was trying to revert the same finalized block which does not make sense to me. Let me know if I can provide more info.

2023-06-12 11:06:08 [SystemDomain] Safety violation: attempted to revert finalized block HashAndNumber { number: 1056559, hash: 0x804b72fd72d32dab576d77dc311b2074a9a01d482bb3535cd13380d2a0cf5600 } which is not in the same chain as last finalized 0x804b72fd72d32dab576d77dc311b2074a9a01d482bb3535cd13380d2a0cf5600
2023-06-12 11:06:08 [SystemDomain] Failed to process primary block on startup error=NotInFinalizedChain
2023-06-12 11:06:08 [SystemDomain] Essential task `system-executor-worker` failed. Shutting down service.
2023-06-12 11:06:08 [SystemDomain] Gossip engine has terminated.
2023-06-12 11:06:08 [SystemDomain] Essential task `system-domain-gossip` failed. Shutting down service.
2023-06-12 11:06:09 [CoreDomain] Starting relayer for domain: DomainId(3)
2023-06-12 11:06:09 [CoreDomain] [apply_extrinsic] after: 0x829eb43a62592d85e0a1dd3d8f2cb1e2cb11736bf80a5c5b0b4263d4a5c8b114
2023-06-12 11:06:09 [CoreDomain] Safety violation: attempted to revert finalized block HashAndNumber { number: 1056559, hash: 0x2e5818cb49b280f6c50d5de86b241b366998ab2521bd54035a7eb367d4fdd055 } which is not in the same chain as last finalized 0x2e5818cb49b280f6c50d5de86b241b366998ab2521bd54035a7eb367d4fdd055
2023-06-12 11:06:09 [CoreDomain] Failed to process primary block on startup error=NotInFinalizedChain
2023-06-12 11:06:09 [CoreDomain] Essential task `core-executor-worker` failed. Shutting down service.
Error: SubstrateService(Other("Essential task failed."))

BTW, we finalize the block at K-depth, specifically, when a new block at height N is imported, we attempt to finalize the block at height N-256.

Steps to reproduce

This happens on Subspace gemini-3d testnet, I guess running the fully-synced domain nodes on gemini-3d may reproduce it, but it's not trivial as the testnet is already over 1M blocks :(

@bkchr
Copy link
Member

bkchr commented Jun 12, 2023

Do you have forks? Could it be that you sync some other fork? So let's say you last finalized 512 - 256 and now try to finalize 511 - 256 because you import some fork.

In general you could also just add a simple:

let new_finalized = block_number - 256;
if new_finalized > client.info().last_finalized {
     finalize(new_finalize);
}

@bkchr
Copy link
Member

bkchr commented Jun 12, 2023

In general this looks like some bug in your code and not in the Substrate code as long as you can not convince me of the opposite ;)

@liuchengxu
Copy link
Contributor Author

Thanks @bkchr for the suggestion, Checking new_finalized_block_number > last_finalized_block_number is helpful.

However, I think there is still a chance to make Substrate generally more robust as the raised error message is confusing to me.

I reproduced this issue locally. To illustrate, K = 3, assuming the blocks are produced as follows, 8 -> 9 > 10a -> 10b -> 11, now block 8 is finalized.

         10a
       /
8 -- 9 
       \
         10b -- 11a

Produce a new block 11b on top of 10a, the chain reorgs from 11a to 11b,

         10a -- 11b
       /
8 -- 9 
       \
         10b -- 11a

the common block is 9, exacted blocks are [10a, 11b]. When 10a is imported on top of 9 again, it attempts to finalize block 7 (assuming we don't check new_finalized_block_number > last_finalized_block_number)

         10a -- 11b
       /
7 -- 8 -- 9 
       \
         10b -- 11a

let route_from_finalized =
sp_blockchain::tree_route(self.backend.blockchain(), last_finalized, block)?;
if let Some(retracted) = route_from_finalized.retracted().get(0) {
warn!(
"Safety violation: attempted to revert finalized block {:?} which is not in the \
same chain as last finalized {:?}",
retracted, last_finalized
);
return Err(sp_blockchain::Error::NotInFinalizedChain)
}

last_finalized is 8, block is 7, and the calculated route_from_finalized will have an empty enacted list and the retracted list [8], indicating that block 7 is already finalized as part of the finalized chain. That being said, we can safely do a no-op here if enacted is empty, we may raise a warning message instead of returning an error.

diff --git a/client/service/src/client/client.rs b/client/service/src/client/client.rs
index 91c59cdbac..ae1f0262cf 100644
--- a/client/service/src/client/client.rs
+++ b/client/service/src/client/client.rs
@@ -927,6 +927,14 @@ where
                        sp_blockchain::tree_route(self.backend.blockchain(), last_finalized, block)?;
 
                if let Some(retracted) = route_from_finalized.retracted().get(0) {
+                       if route_from_finalized.enacted().is_empty() {
+                               warn!(
+                                       "Attempted to re-finalize an older finalized block {block:?}, \
+                                       last_finalized: {last_finalized}",
+                               );
+                               return Ok(())
+                       }
+
                        warn!(
                                "Safety violation: attempted to revert finalized block {:?} which is not in the \
                                same chain as last finalized {:?}",

I can send a PR if it makes sense.

@bkchr
Copy link
Member

bkchr commented Jun 13, 2023

Improving the error message is fine. However, I'm not sure about not returning an error. I see your reasoning, but we print a message as someone misused the api. So, if we print a message we should also return an error?

CC @davxy @andresilva

liuchengxu added a commit to autonomys/subspace that referenced this issue Jun 13, 2023
@davxy
Copy link
Member

davxy commented Jun 13, 2023

For sure error message could be improved, as it is not super clear.
Maybe with something like : "Attempting to finalize block {hash} {number} which is not in the same chain as last finalized {last-fin-hash} {last-fin-number}."

Said that, in theory calling this function against something that has already been finalized (i.e. an ancestor of info.finalized_hash or in other words route_from_finalized.enacted().is_empty() == true) should be fine.
We already doing something similar just above (here) if we re-finalize the last finalized.

BUT there are some assumptions around and this modification may have some undesired side effects at the moment.
All code paths should be analyzed.

For example here we are setting this block as new_best (if import_existing is true).
And even if this has no consequences in the backend (to be checked) we end up having the ImportSummary::is_new_nest == true here.
This may have consequences or not. The code path should be carefully analyzed.

(actually your code doesn't call apply_finality_with_block_hash through this path, but via finalize_block, but the reasoning above still apply).

In conclusion, I think that an innocent tweak like this is correct but may have repercussions somewhere else and thus you should eventually carefully check it

@liuchengxu
Copy link
Contributor Author

Not sure how long it will take to apply this tweak, in order to help the community not be trapped here, we can at least improve the docs to finalize_block to mention that the caller is responsible for not re-finalizing the older finalized block even if it's intuitively harmless.

liuchengxu added a commit to autonomys/subspace that referenced this issue Jun 14, 2023
@davxy
Copy link
Member

davxy commented Jun 14, 2023

I given had a better look and apply_finality_with_block_hash is called only by 2 places:

  1. import_block (if import params finalized = true and import_existing = true)
  2. finalize_block

If we return Ok in case that route_from_finalized.enacted().is_empty() (in other words finalizing something already finalized):

In both cases we don't push anything new into the ClientImportOperation, so per-se is a no-op.

BUT

In case 1 even if the call is successful this will then fail in the backend when we try to commit the operation (Error::NotInFinalizedChain => "Potential long-range attack: block not in finalized chain").

In case 2 the call ends-up being successful and looks harmless. But, as I said, we have to keep an eye for the places where this call is performed to not disrupt some assumption. For example in grandpa this may have consequences that should eventually be analyzed:

client
.apply_finality(import_op, hash, persisted_justification, true)

(e.g. update_best_justification just below doesn't look to perform any check and the prev best justification may end up being overwritten)

@andresilva any consideration about this and the proposal of not failing?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants