ethclient/simulated: Fix flaky rollback test #32280

kashitaka · 2025-07-25T18:18:43Z

This PR addresses a flakiness in the rollback test.

I found nonce collision caused transactions occasionally fail to send. I tried to change error message in the failed test like:

	if err = client.SendTransaction(ctx, signedTx); err != nil {
		t.Fatalf("failed to send transaction: %v, nonce: %d", err, signedTx.Nonce())
	}

and I occasionally got test failure with this message:

=== CONT  TestFlakyFunction/Run_#100
    rollback_test.go:44: failed to send transaction: already known, nonce: 0
--- FAIL: TestFlakyFunction/Run_#100 (0.07s)

Although nonces are obtained via PendingNonceAt, we observed that, in rare cases (approximately 1 in 1000), two transactions from the same sender end up with the same nonce. This likely happens because tx0 has not yet propagated to the transaction pool before tx1 requests its nonce. When the test succeeds, tx0 and tx1 have nonces 0 and 1, respectively. However, in rare failures, both transactions end up with nonce 0.

Fix

We modified the test to explicitly assign nonces to each transaction. By controlling the nonce values manually, we eliminated the race condition and ensured consistent behavior. After several thousand runs, the flakiness was no longer reproducible in my local environment.

Additional

Reduced internal polling interval in pendingStateHasTx() to speed up test execution without impacting stability. It reduces test time for TestTransactionRollbackBehavior from about 7 seconds to 2 seconds.

rjl493456442 · 2025-07-28T01:20:33Z

Can you modify the testSendSignedTx, providing the nonce manually instead of fetching from the pendingNonce?

So that you can get rid of the time.Sleep between the tx submission.

kashitaka · 2025-07-28T03:08:30Z

@rjl493456442 Thank you for your feedback. I applied it in 74caa26

Can you modify the testSendSignedTx, providing the nonce manually instead of fetching from the pendingNonce?
So that you can get rid of the time.Sleep between the tx submission.

Agreed, that sounds like a more stable approach. I initially assumed the test specifically wanted to fetch the nonce via PendingNonceAt, so I didn’t modify that part. But if it’s okay to change, I agree it’s better this way.

Since we can’t modify the nonce on an already constructed tx, I didn’t change testSendSignedTx itself — instead, I updated newTx and newBlobTx to accept and use a manually provided nonce. Could you take a look and let me know if this approach looks legit?

rjl493456442 · 2025-07-28T03:37:12Z

ethclient/simulated/rollback_test.go

 	)

 	// Poll for receipt with timeout
-	deadline := time.Now().Add(2 * time.Second)


Any particular reason for changing the wait time duration?

Our CI system is occasionally overloaded, have a loose duration is usually preferred

@rjl493456442 Thanks for the comment! As mentioned in the PR description, It reduces test time for TestTransactionRollbackBehavior from about 7 seconds to 2 seconds.

I shortened the wait time because I ran the test thousands times to reproduce the flakiness, and here was the bottle neck. The test remained stable with the shorter duration, so I kept the change for some improvement.

I don't get it. This loop is for checking the availability of the receipt. Once it's found, the loop will be terminated.

If the CI can be finished within 2s, then setting it with 7s has no side effect.

I’m not sure if this behavior is intentional or unexpected. In this line:

if !pendingStateHasTx(client, btx2) || !pendingStateHasTx(client, tx2) || !pendingStateHasTx(client, tx3) {

the polling loop exits almost immediately, since the receipt is available and TransactionReceipt() succeeds — just as you explained. However, in this part:

if pendingStateHasTx(client, btx0) || pendingStateHasTx(client, tx0) || pendingStateHasTx(client, tx1) {

each call to pendingStateHasTx takes the full 2 seconds and takes 6 seconds in total, because:

receipt, err = client.TransactionReceipt(ctx, tx.Hash())

returns an error like "transaction indexing is in progress" and doesn't drop into break in the 2 sec polling, which seems to happen when querying uncommitted transactions. And that’s why I shortened the polling timeout.

@rjl493456442 If you prefer a more relaxed polling interval, please let me know—I’m happy to revert that change.

rjl493456442 · 2025-07-28T14:16:54Z

Thanks for your solid work!

* core/types: minimize this invalid intermediate state (ethereum#32241) * core/rawdb: downgrade log level in chain freezer (ethereum#32253) * triedb/pathdb: use binary.append to eliminate the tmp scratch slice (ethereum#32250) `binary.AppendUvarint` offers better performance than using append directly, because it avoids unnecessary memory allocation and copying. In our case, it can increase the performance by +35.8% for the `blockWriter.append` function: ``` benchmark old ns/op new ns/op delta BenchmarkBlockWriterAppend-8 5.97 3.83 -35.80% ``` --------- Signed-off-by: jsvisa <delweng@gmail.com> Co-authored-by: Gary Rong <garyrong0905@gmail.com> * p2p/rlpx: optimize XOR operation using bitutil.XORBytes (ethereum#32217) Replace manual byte-by-byte XOR implementation with the optimized bitutil.XORBytes function. This improves performance by using word-sized operations on supported architectures while maintaining the same functionality. The optimized version processes data in bulk rather than one byte at a time --------- Co-authored-by: Felix Lange <fjl@twurst.com> * eth/gasestimator: fix potential overflow (ethereum#32255) Improve binary search, preventing the potential overflow in certain L2 cases * triedb/pathdb: fix an deadlock in history indexer (ethereum#32260) Seems the `signal.result` was not sent back in shorten case, this will cause a deadlock. --------- Signed-off-by: jsvisa <delweng@gmail.com> Co-authored-by: Gary Rong <garyrong0905@gmail.com> * eth/protocols/snap: add healing and syncing metrics (ethereum#32258) Adds the heal time and snap sync time to grafana --------- Co-authored-by: Gary Rong <garyrong0905@gmail.com> * core: replace the empty fmt.Errorf with errors.New (ethereum#32274) The `errors.new` function does not require string formatting, so its performance is better than that of `fmt.Errorf`. * eth/catalyst: fix error message in ExecuteStatelessPayloadV4 (ethereum#32269) Correct the error message in the ExecuteStatelessPayloadV4 function to reference newPayloadV4 and the Prague fork, instead of incorrectly referencing newPayloadV3 and Cancun. This improves clarity during debugging and aligns the error message with the actual function and fork being validated. No logic is changed. --------- Co-authored-by: rjl493456442 <garyrong0905@gmail.com> * cmd, eth, internal: introduce debug_sync (ethereum#32177) Alternative implementation of ethereum#32159 * all: replace fmt.Errorf with errors.New (ethereum#32286) The errors.new function does not require string formatting, so its performance is better than that of fmt.Errorf. * downloader: fix typos, grammar and formatting (ethereum#32288) * ethclient/simulated: Fix flaky rollback test (ethereum#32280) This PR addresses a flakiness in the rollback test discussed in ethereum#32252 I found `nonce` collision caused transactions occasionally fail to send. I tried to change error message in the failed test like: ``` if err = client.SendTransaction(ctx, signedTx); err != nil { t.Fatalf("failed to send transaction: %v, nonce: %d", err, signedTx.Nonce()) } ``` and I occasionally got test failure with this message: ``` === CONT TestFlakyFunction/Run_#100 rollback_test.go:44: failed to send transaction: already known, nonce: 0 --- FAIL: TestFlakyFunction/Run_#100 (0.07s) ``` Although `nonces` are obtained via `PendingNonceAt`, we observed that, in rare cases (approximately 1 in 1000), two transactions from the same sender end up with the same nonce. This likely happens because `tx0` has not yet propagated to the transaction pool before `tx1` requests its nonce. When the test succeeds, `tx0` and `tx1` have nonces `0` and `1`, respectively. However, in rare failures, both transactions end up with nonce `0`. We modified the test to explicitly assign nonces to each transaction. By controlling the nonce values manually, we eliminated the race condition and ensured consistent behavior. After several thousand runs, the flakiness was no longer reproducible in my local environment. Reduced internal polling interval in `pendingStateHasTx()` to speed up test execution without impacting stability. It reduces test time for `TestTransactionRollbackBehavior` from about 7 seconds to 2 seconds. * core/state: preallocate capacity for logs list (ethereum#32291) Improvement: preallocate capacity for `logs` at first to avoid reallocating multi times. * core/state: improve PrettyPrint function (ethereum#32293) --------- Signed-off-by: jsvisa <delweng@gmail.com> Co-authored-by: maskpp <maskpp266@gmail.com> Co-authored-by: Delweng <delweng@gmail.com> Co-authored-by: Gary Rong <garyrong0905@gmail.com> Co-authored-by: Micke <155267459+reallesee@users.noreply.github.com> Co-authored-by: Felix Lange <fjl@twurst.com> Co-authored-by: gzeon <h@arry.io> Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de> Co-authored-by: nthumann <nthumanna@gmail.com> Co-authored-by: Galoretka <galoretochka@gmail.com> Co-authored-by: ericxtheodore <ericxtheodore@outlook.com> Co-authored-by: Tomás Andróil <tomasandroil@gmail.com> Co-authored-by: kashitaka <takao.w9st.kashima@xica.net>

This PR addresses a flakiness in the rollback test discussed in ethereum#32252 I found `nonce` collision caused transactions occasionally fail to send. I tried to change error message in the failed test like: ``` if err = client.SendTransaction(ctx, signedTx); err != nil { t.Fatalf("failed to send transaction: %v, nonce: %d", err, signedTx.Nonce()) } ``` and I occasionally got test failure with this message: ``` === CONT TestFlakyFunction/Run_#100 rollback_test.go:44: failed to send transaction: already known, nonce: 0 --- FAIL: TestFlakyFunction/Run_#100 (0.07s) ``` Although `nonces` are obtained via `PendingNonceAt`, we observed that, in rare cases (approximately 1 in 1000), two transactions from the same sender end up with the same nonce. This likely happens because `tx0` has not yet propagated to the transaction pool before `tx1` requests its nonce. When the test succeeds, `tx0` and `tx1` have nonces `0` and `1`, respectively. However, in rare failures, both transactions end up with nonce `0`. We modified the test to explicitly assign nonces to each transaction. By controlling the nonce values manually, we eliminated the race condition and ensured consistent behavior. After several thousand runs, the flakiness was no longer reproducible in my local environment. Reduced internal polling interval in `pendingStateHasTx()` to speed up test execution without impacting stability. It reduces test time for `TestTransactionRollbackBehavior` from about 7 seconds to 2 seconds.

ethclient/simulated: Fix flaky rollback test

8bdbf7f

kashitaka requested a review from fjl as a code owner July 25, 2025 18:18

kashitaka mentioned this pull request Jul 25, 2025

Fix flaky test TestEthClient #32252

Closed

ethclient/simulated: change test to explicitly specify nonces

74caa26

rjl493456442 reviewed Jul 28, 2025

View reviewed changes

kashitaka force-pushed the kashi/issue-32252 branch from 9a9b71f to 74caa26 Compare July 28, 2025 08:07

rjl493456442 approved these changes Jul 28, 2025

View reviewed changes

rjl493456442 added this to the 1.16.2 milestone Jul 28, 2025

rjl493456442 merged commit eb7aef4 into ethereum:master Jul 28, 2025
4 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ethclient/simulated: Fix flaky rollback test #32280

ethclient/simulated: Fix flaky rollback test #32280

Uh oh!

kashitaka commented Jul 25, 2025 •

edited

Loading

Uh oh!

rjl493456442 commented Jul 28, 2025

Uh oh!

kashitaka commented Jul 28, 2025

Uh oh!

rjl493456442 Jul 28, 2025

Uh oh!

rjl493456442 Jul 28, 2025

Uh oh!

kashitaka Jul 28, 2025 •

edited

Loading

Uh oh!

rjl493456442 Jul 28, 2025

Uh oh!

kashitaka Jul 28, 2025

Uh oh!

kashitaka Jul 28, 2025

Uh oh!

rjl493456442 commented Jul 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ethclient/simulated: Fix flaky rollback test #32280

ethclient/simulated: Fix flaky rollback test #32280

Uh oh!

Conversation

kashitaka commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix

Additional

Uh oh!

rjl493456442 commented Jul 28, 2025

Uh oh!

kashitaka commented Jul 28, 2025

Uh oh!

rjl493456442 Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

rjl493456442 Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

kashitaka Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rjl493456442 Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

kashitaka Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

kashitaka Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

rjl493456442 commented Jul 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kashitaka commented Jul 25, 2025 •

edited

Loading

kashitaka Jul 28, 2025 •

edited

Loading