Synchronize access to block header #6143

matthew1001 · 2023-11-08T10:32:51Z

PR description

As outlined in the comment in #6140 it appears to be the case that access to the block header isn't thread safe when the block being requested is a newly added chain head.

Fixed Issue(s)

Fixes #6140

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

github-actions · 2023-11-08T10:33:07Z

I thought about documentation and added the doc-change-required label to this PR if updates are required.
I thought about the changelog and included a changelog update if required.
If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

garyschulte · 2023-11-08T18:11:23Z

Should we need to synchronize blockheader access? This isn't something that typically needs concurrency safety, except of course near-head.

Is the transaction spam using block number or blockhash? I can understand chain head not available if the tx spammer is getting ahead of itself by predicting what the blocknumber should be based on block time. But if it is blockhash, where is the block builder getting the blockhash / block number ? which one are we using in this case

Without context, this seems like an issue that should be solved by ensuring we commit and persist the blockchain rocksdb transaction rather than paying a synchronization overhead cost when trying to access block headers.

edit: Has this fixed the issue in your test case? If so, perhaps we should overload with synchonized access, that way we can have synchronization when necessary, and not pay the cost for operations that do not require it

matthew1001 · 2023-11-09T10:34:13Z

Yeah it's exactly the near-head case that the problem occurs. Basically the chain header is updated (inside a synchronized function) and then storage is updated and committed (in the same synchronized function).

But the getter for the chain head can be called at any time, and because it's not synchronized it can retrieve the new value before the storage commit has happened.

So possibly an alternative to synchronizing on getBlockHeader() would be to synchronize on getChainHeadHash() (and other related getChainHead*() methods probably) if getting the chain head is less frequent compared to getBlockHeader(). What do you think?

matthew1001 · 2023-11-10T11:09:36Z

Going to run some tests with a change to synchronizing getChainHead*, not getBlockHeader, and if it fixes the issue I'll update the PR

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

matthew1001 · 2023-11-10T16:49:23Z

@garyschulte I'm running my tests again using your suggestion of overloading getBlockHeader by adding getBlockHeaderSafe which is synchronized, and the caller can choose to use in the rare event that the block header for the latest block isn't available. I'll update here once I'm happy that the issue isn't showing any more. Let me know what you think of the changes.

garyschulte

Seems good to me, but I think we can DRY it up a bit

garyschulte · 2023-11-10T19:04:38Z

ethereum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/transactions/TransactionPool.java

+    // Optimistically get the chain head. getChainHeadBlockHeader() doesn't take any locks,
+    // which might mean that the latest block is still being committed to storage. If this
+    // call fails try the synchronized alternative, and if that fails give up.
+    BlockHeader chainHeadBlockHeader = getChainHeadBlockHeader().orElse(null);


how about we leave this as it is and have this logic directly in getChainBlockHeader()?

Yeah that's a good thought. I'll tidy it up and push a new commit.

New commits pushed @garyschulte. I'll mark as auto-merge once you're happy with them.

garyschulte · 2023-11-10T19:06:14Z

ethereum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/transactions/TransactionPool.java

+  private Optional<BlockHeader> getChainHeadBlockHeaderSafe() {
+    final MutableBlockchain blockchain = protocolContext.getBlockchain();
+    return blockchain.getBlockHeaderSafe(blockchain.getChainHeadHash());
+  }
+


how about instead of adding here we add the safe call to the existing getChainHeadBlockHeader(), e.g.:

private Optional<BlockHeader> getChainHeadBlockHeader() { final MutableBlockchain blockchain = protocolContext.getBlockchain(); return blockchain.getBlockHeader(blockchain.getChainHeadHash()) .or(() -> blockchain.getBlockHeaderSafe(blockchain.getChainHeadHash())); }

See latest commits

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

fab-10 · 2023-11-13T10:38:59Z

I like the approach of the *Safe version of the method, let's monitor this to see if the issue is gone, otherwise as suggested before we can implement a retry policy, in case of a transient error, like chain head not found.

matthew1001 · 2023-11-13T14:16:00Z

The current PR definitely fixes the issue - no failures after >10 hours running a single validator at 50TPS. Question is whether we stick with the refactor I did under 1641b23 or revert to having separate getChainHeadBlockHeader() and getChainHeadBlockHeaderSafe(). I don't have a strong opinion either way. With the latest commit, getChainHeadBlockHeader() will make 2 attempts to get a block header if the block being requested genuinely doesn't exist. But for any block < chain head (and most calls for the chain head block except in very occasional circumstances) this shouldn't affect performance. So I'm happy sticking with the latest refactor, but likewise happy to revert.

@garyschulte any thoughts? It would be nice to get this into 23.10.2 as it causes unpleasant behaviour if you hit it.

garyschulte

👍 LGTM, especially if this solves the case you are encountering.

* Synchronize access to block header Signed-off-by: Matthew Whitehead <matthew1001@gmail.com> * Add 'safe' version of getBlockHeader Signed-off-by: Matthew Whitehead <matthew1001@gmail.com> * Move retry with lock into getChainHeadBlockHeader() Signed-off-by: Matthew Whitehead <matthew1001@gmail.com> * Reinstate 'final' modifier Signed-off-by: Matthew Whitehead <matthew1001@gmail.com> --------- Signed-off-by: Matthew Whitehead <matthew1001@gmail.com> Signed-off-by: Justin Florentine <justin+github@florentine.us>

* Synchronize access to block header Signed-off-by: Matthew Whitehead <matthew1001@gmail.com> * Add 'safe' version of getBlockHeader Signed-off-by: Matthew Whitehead <matthew1001@gmail.com> * Move retry with lock into getChainHeadBlockHeader() Signed-off-by: Matthew Whitehead <matthew1001@gmail.com> * Reinstate 'final' modifier Signed-off-by: Matthew Whitehead <matthew1001@gmail.com> --------- Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Synchronize access to block header

539d7e2

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

matthew1001 mentioned this pull request Nov 8, 2023

Log missing chain head as warning, not trace #6127

Merged

matthew1001 added 2 commits November 8, 2023 13:59

Merge branch 'main' into chain-head-update-fail

cddf3f3

Merge branch 'main' into chain-head-update-fail

dedf90d

matthew1001 added 2 commits November 10, 2023 16:43

Add 'safe' version of getBlockHeader

dedb619

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Merge branch 'main' into chain-head-update-fail

0c972fd

garyschulte requested changes Nov 10, 2023

View reviewed changes

matthew1001 added 3 commits November 12, 2023 13:11

Move retry with lock into getChainHeadBlockHeader()

1641b23

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Merge branch 'main' into chain-head-update-fail

5b55354

Reinstate 'final' modifier

29b863a

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

matthew1001 enabled auto-merge (squash) November 12, 2023 13:18

matthew1001 requested a review from garyschulte November 13, 2023 11:14

garyschulte approved these changes Nov 13, 2023

View reviewed changes

Merge branch 'main' into chain-head-update-fail

ca6b9e9

matthew1001 merged commit 331ddcf into hyperledger:main Nov 14, 2023
19 checks passed

This was referenced Jan 3, 2024

Under heavy load eth_estimateGas returns INTERNAL_ERROR #6344

Closed

Use synchronized call to access the chain head block in eth_estimateGas #6345

Merged

macfarla mentioned this pull request May 14, 2024

Intermittent 400 Bad Request when sending a transaction #4212

Closed

matthew1001 mentioned this pull request Jul 23, 2024

eth_gasPrice ERROR Could not retrieve block #7287

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronize access to block header #6143

Synchronize access to block header #6143

matthew1001 commented Nov 8, 2023

github-actions bot commented Nov 8, 2023 •

edited by matthew1001

Loading

garyschulte commented Nov 8, 2023 •

edited

Loading

matthew1001 commented Nov 9, 2023

matthew1001 commented Nov 10, 2023

matthew1001 commented Nov 10, 2023

garyschulte left a comment

garyschulte Nov 10, 2023

matthew1001 Nov 11, 2023

matthew1001 Nov 12, 2023

garyschulte Nov 10, 2023

matthew1001 Nov 12, 2023

fab-10 commented Nov 13, 2023

matthew1001 commented Nov 13, 2023

garyschulte left a comment

Synchronize access to block header #6143

Synchronize access to block header #6143

Conversation

matthew1001 commented Nov 8, 2023

PR description

Fixed Issue(s)

github-actions bot commented Nov 8, 2023 • edited by matthew1001 Loading

garyschulte commented Nov 8, 2023 • edited Loading

matthew1001 commented Nov 9, 2023

matthew1001 commented Nov 10, 2023

matthew1001 commented Nov 10, 2023

garyschulte left a comment

Choose a reason for hiding this comment

garyschulte Nov 10, 2023

Choose a reason for hiding this comment

matthew1001 Nov 11, 2023

Choose a reason for hiding this comment

matthew1001 Nov 12, 2023

Choose a reason for hiding this comment

garyschulte Nov 10, 2023

Choose a reason for hiding this comment

matthew1001 Nov 12, 2023

Choose a reason for hiding this comment

fab-10 commented Nov 13, 2023

matthew1001 commented Nov 13, 2023

garyschulte left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 8, 2023 •

edited by matthew1001

Loading

garyschulte commented Nov 8, 2023 •

edited

Loading