-
Notifications
You must be signed in to change notification settings - Fork 21.6k
ethclient/simulated: Fix flaky rollback test #32280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can you modify the So that you can get rid of the |
|
@rjl493456442 Thank you for your feedback. I applied it in 74caa26
Agreed, that sounds like a more stable approach. I initially assumed the test specifically wanted to fetch the nonce via Since we can’t modify the nonce on an already constructed |
| ) | ||
|
|
||
| // Poll for receipt with timeout | ||
| deadline := time.Now().Add(2 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for changing the wait time duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our CI system is occasionally overloaded, have a loose duration is usually preferred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjl493456442 Thanks for the comment! As mentioned in the PR description, It reduces test time for TestTransactionRollbackBehavior from about 7 seconds to 2 seconds.
I shortened the wait time because I ran the test thousands times to reproduce the flakiness, and here was the bottle neck. The test remained stable with the shorter duration, so I kept the change for some improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. This loop is for checking the availability of the receipt. Once it's found, the loop will be terminated.
If the CI can be finished within 2s, then setting it with 7s has no side effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure if this behavior is intentional or unexpected. In this line:
if !pendingStateHasTx(client, btx2) || !pendingStateHasTx(client, tx2) || !pendingStateHasTx(client, tx3) {
the polling loop exits almost immediately, since the receipt is available and TransactionReceipt() succeeds — just as you explained. However, in this part:
if pendingStateHasTx(client, btx0) || pendingStateHasTx(client, tx0) || pendingStateHasTx(client, tx1) {
each call to pendingStateHasTx takes the full 2 seconds and takes 6 seconds in total, because:
receipt, err = client.TransactionReceipt(ctx, tx.Hash())
returns an error like "transaction indexing is in progress" and doesn't drop into break in the 2 sec polling, which seems to happen when querying uncommitted transactions. And that’s why I shortened the polling timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjl493456442 If you prefer a more relaxed polling interval, please let me know—I’m happy to revert that change.
9a9b71f to
74caa26
Compare
|
Thanks for your solid work! |
* core/types: minimize this invalid intermediate state (ethereum#32241) * core/rawdb: downgrade log level in chain freezer (ethereum#32253) * triedb/pathdb: use binary.append to eliminate the tmp scratch slice (ethereum#32250) `binary.AppendUvarint` offers better performance than using append directly, because it avoids unnecessary memory allocation and copying. In our case, it can increase the performance by +35.8% for the `blockWriter.append` function: ``` benchmark old ns/op new ns/op delta BenchmarkBlockWriterAppend-8 5.97 3.83 -35.80% ``` --------- Signed-off-by: jsvisa <delweng@gmail.com> Co-authored-by: Gary Rong <garyrong0905@gmail.com> * p2p/rlpx: optimize XOR operation using bitutil.XORBytes (ethereum#32217) Replace manual byte-by-byte XOR implementation with the optimized bitutil.XORBytes function. This improves performance by using word-sized operations on supported architectures while maintaining the same functionality. The optimized version processes data in bulk rather than one byte at a time --------- Co-authored-by: Felix Lange <fjl@twurst.com> * eth/gasestimator: fix potential overflow (ethereum#32255) Improve binary search, preventing the potential overflow in certain L2 cases * triedb/pathdb: fix an deadlock in history indexer (ethereum#32260) Seems the `signal.result` was not sent back in shorten case, this will cause a deadlock. --------- Signed-off-by: jsvisa <delweng@gmail.com> Co-authored-by: Gary Rong <garyrong0905@gmail.com> * eth/protocols/snap: add healing and syncing metrics (ethereum#32258) Adds the heal time and snap sync time to grafana --------- Co-authored-by: Gary Rong <garyrong0905@gmail.com> * core: replace the empty fmt.Errorf with errors.New (ethereum#32274) The `errors.new` function does not require string formatting, so its performance is better than that of `fmt.Errorf`. * eth/catalyst: fix error message in ExecuteStatelessPayloadV4 (ethereum#32269) Correct the error message in the ExecuteStatelessPayloadV4 function to reference newPayloadV4 and the Prague fork, instead of incorrectly referencing newPayloadV3 and Cancun. This improves clarity during debugging and aligns the error message with the actual function and fork being validated. No logic is changed. --------- Co-authored-by: rjl493456442 <garyrong0905@gmail.com> * cmd, eth, internal: introduce debug_sync (ethereum#32177) Alternative implementation of ethereum#32159 * all: replace fmt.Errorf with errors.New (ethereum#32286) The errors.new function does not require string formatting, so its performance is better than that of fmt.Errorf. * downloader: fix typos, grammar and formatting (ethereum#32288) * ethclient/simulated: Fix flaky rollback test (ethereum#32280) This PR addresses a flakiness in the rollback test discussed in ethereum#32252 I found `nonce` collision caused transactions occasionally fail to send. I tried to change error message in the failed test like: ``` if err = client.SendTransaction(ctx, signedTx); err != nil { t.Fatalf("failed to send transaction: %v, nonce: %d", err, signedTx.Nonce()) } ``` and I occasionally got test failure with this message: ``` === CONT TestFlakyFunction/Run_#100 rollback_test.go:44: failed to send transaction: already known, nonce: 0 --- FAIL: TestFlakyFunction/Run_#100 (0.07s) ``` Although `nonces` are obtained via `PendingNonceAt`, we observed that, in rare cases (approximately 1 in 1000), two transactions from the same sender end up with the same nonce. This likely happens because `tx0` has not yet propagated to the transaction pool before `tx1` requests its nonce. When the test succeeds, `tx0` and `tx1` have nonces `0` and `1`, respectively. However, in rare failures, both transactions end up with nonce `0`. We modified the test to explicitly assign nonces to each transaction. By controlling the nonce values manually, we eliminated the race condition and ensured consistent behavior. After several thousand runs, the flakiness was no longer reproducible in my local environment. Reduced internal polling interval in `pendingStateHasTx()` to speed up test execution without impacting stability. It reduces test time for `TestTransactionRollbackBehavior` from about 7 seconds to 2 seconds. * core/state: preallocate capacity for logs list (ethereum#32291) Improvement: preallocate capacity for `logs` at first to avoid reallocating multi times. * core/state: improve PrettyPrint function (ethereum#32293) --------- Signed-off-by: jsvisa <delweng@gmail.com> Co-authored-by: maskpp <maskpp266@gmail.com> Co-authored-by: Delweng <delweng@gmail.com> Co-authored-by: Gary Rong <garyrong0905@gmail.com> Co-authored-by: Micke <155267459+reallesee@users.noreply.github.com> Co-authored-by: Felix Lange <fjl@twurst.com> Co-authored-by: gzeon <h@arry.io> Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de> Co-authored-by: nthumann <nthumanna@gmail.com> Co-authored-by: Galoretka <galoretochka@gmail.com> Co-authored-by: ericxtheodore <ericxtheodore@outlook.com> Co-authored-by: Tomás Andróil <tomasandroil@gmail.com> Co-authored-by: kashitaka <takao.w9st.kashima@xica.net>
This PR addresses a flakiness in the rollback test discussed in ethereum#32252 I found `nonce` collision caused transactions occasionally fail to send. I tried to change error message in the failed test like: ``` if err = client.SendTransaction(ctx, signedTx); err != nil { t.Fatalf("failed to send transaction: %v, nonce: %d", err, signedTx.Nonce()) } ``` and I occasionally got test failure with this message: ``` === CONT TestFlakyFunction/Run_#100 rollback_test.go:44: failed to send transaction: already known, nonce: 0 --- FAIL: TestFlakyFunction/Run_#100 (0.07s) ``` Although `nonces` are obtained via `PendingNonceAt`, we observed that, in rare cases (approximately 1 in 1000), two transactions from the same sender end up with the same nonce. This likely happens because `tx0` has not yet propagated to the transaction pool before `tx1` requests its nonce. When the test succeeds, `tx0` and `tx1` have nonces `0` and `1`, respectively. However, in rare failures, both transactions end up with nonce `0`. We modified the test to explicitly assign nonces to each transaction. By controlling the nonce values manually, we eliminated the race condition and ensured consistent behavior. After several thousand runs, the flakiness was no longer reproducible in my local environment. Reduced internal polling interval in `pendingStateHasTx()` to speed up test execution without impacting stability. It reduces test time for `TestTransactionRollbackBehavior` from about 7 seconds to 2 seconds.
This PR addresses a flakiness in the rollback test discussed in ethereum#32252 I found `nonce` collision caused transactions occasionally fail to send. I tried to change error message in the failed test like: ``` if err = client.SendTransaction(ctx, signedTx); err != nil { t.Fatalf("failed to send transaction: %v, nonce: %d", err, signedTx.Nonce()) } ``` and I occasionally got test failure with this message: ``` === CONT TestFlakyFunction/Run_#100 rollback_test.go:44: failed to send transaction: already known, nonce: 0 --- FAIL: TestFlakyFunction/Run_#100 (0.07s) ``` Although `nonces` are obtained via `PendingNonceAt`, we observed that, in rare cases (approximately 1 in 1000), two transactions from the same sender end up with the same nonce. This likely happens because `tx0` has not yet propagated to the transaction pool before `tx1` requests its nonce. When the test succeeds, `tx0` and `tx1` have nonces `0` and `1`, respectively. However, in rare failures, both transactions end up with nonce `0`. We modified the test to explicitly assign nonces to each transaction. By controlling the nonce values manually, we eliminated the race condition and ensured consistent behavior. After several thousand runs, the flakiness was no longer reproducible in my local environment. Reduced internal polling interval in `pendingStateHasTx()` to speed up test execution without impacting stability. It reduces test time for `TestTransactionRollbackBehavior` from about 7 seconds to 2 seconds.
This PR addresses a flakiness in the rollback test discussed in ethereum#32252 I found `nonce` collision caused transactions occasionally fail to send. I tried to change error message in the failed test like: ``` if err = client.SendTransaction(ctx, signedTx); err != nil { t.Fatalf("failed to send transaction: %v, nonce: %d", err, signedTx.Nonce()) } ``` and I occasionally got test failure with this message: ``` === CONT TestFlakyFunction/Run_#100 rollback_test.go:44: failed to send transaction: already known, nonce: 0 --- FAIL: TestFlakyFunction/Run_#100 (0.07s) ``` Although `nonces` are obtained via `PendingNonceAt`, we observed that, in rare cases (approximately 1 in 1000), two transactions from the same sender end up with the same nonce. This likely happens because `tx0` has not yet propagated to the transaction pool before `tx1` requests its nonce. When the test succeeds, `tx0` and `tx1` have nonces `0` and `1`, respectively. However, in rare failures, both transactions end up with nonce `0`. We modified the test to explicitly assign nonces to each transaction. By controlling the nonce values manually, we eliminated the race condition and ensured consistent behavior. After several thousand runs, the flakiness was no longer reproducible in my local environment. Reduced internal polling interval in `pendingStateHasTx()` to speed up test execution without impacting stability. It reduces test time for `TestTransactionRollbackBehavior` from about 7 seconds to 2 seconds.
Fixes: #32252
This PR addresses a flakiness in the rollback test.
I found
noncecollision caused transactions occasionally fail to send. I tried to change error message in the failed test like:and I occasionally got test failure with this message:
Although
noncesare obtained viaPendingNonceAt, we observed that, in rare cases (approximately 1 in 1000), two transactions from the same sender end up with the same nonce. This likely happens becausetx0has not yet propagated to the transaction pool beforetx1requests its nonce. When the test succeeds,tx0andtx1have nonces0and1, respectively. However, in rare failures, both transactions end up with nonce0.Fix
We modified the test to explicitly assign nonces to each transaction. By controlling the nonce values manually, we eliminated the race condition and ensured consistent behavior. After several thousand runs, the flakiness was no longer reproducible in my local environment.
Additional
Reduced internal polling interval in
pendingStateHasTx()to speed up test execution without impacting stability. It reduces test time forTestTransactionRollbackBehaviorfrom about 7 seconds to 2 seconds.