bring down optimistic nonce in case recycled transactions fail #52

d4mr · 2025-09-23T04:24:44Z

Summary by CodeRabbit

Bug Fixes
- Improved handling of recycled nonces to reduce missed or duplicate transaction processing.
- Keeps optimistic transaction count synchronized with submitted and recycled nonces to prevent gaps and stalls.
- Standardized time retrieval for consistent timestamping across operations.
Chores
- Internal cleanups and formatting adjustments for readability and maintainability.
- No user-facing API changes.

coderabbitai · 2025-09-23T04:24:51Z

Walkthrough

Expanded Redis WATCH sets to include recycled nonces and an optimistic transaction count key; added a write to set that count after recycling. Switched a worker timestamp source to EoaExecutorStore::now(). Minor formatting updates. No public API changes except a new key accessor in the store keys.

Changes

Cohort / File(s)	Summary
EOA Store — Borrowed flow key watch `executors/src/eoa/store/borrowed.rs`	Added `recycled_nonces_zset_name` to WATCH set in ProcessBorrowedTransactions; minor formatting tweaks.
EOA Store — Submitted/recycled nonces and optimistic count `executors/src/eoa/store/submitted.rs`	Added public accessor `optimistic_transaction_count_key_name()`. WATCH now includes optimistic count key in CleanAndGetRecycledNonces. After pruning recycled nonces, set optimistic count to `highest_submitted_nonce + 1`.
EOA Worker — Time source `executors/src/eoa/worker/send.rs`	Replaced `Utc::now().timestamp_millis()` with `EoaExecutorStore::now()`; import reordering only.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant W as Worker
  participant S as EoaExecutorStore
  participant R as Redis

  rect rgb(240,248,255)
  note over S,R: ProcessBorrowedTransactions - expanded WATCH
  W->>S: process_borrowed_transactions()
  S->>R: WATCH borrowed_transactions_hashmap, recycled_nonces_zset
  alt keys stable
    S->>R: MULTI
    S->>R: HGETALL borrowed_transactions_hashmap
    S->>R: ZRANGE recycled_nonces_zset
    R-->>S: results
    S->>R: EXEC
    R-->>S: OK
    S-->>W: processed entries
  else conflict
    R-->>S: EXEC aborted (watched key changed)
    S-->>W: retry/backoff
  end
  end

sequenceDiagram
  autonumber
  participant C as Cleaner
  participant S as EoaExecutorStore
  participant R as Redis

  rect rgb(245,255,245)
  note over S,R: CleanAndGetRecycledNonces - includes optimistic count update
  C->>S: clean_and_get_recycled_nonces()
  S->>R: WATCH recycled_nonces_zset, submitted_nonces_..., optimistic_tx_count_key
  alt keys stable
    S->>R: MULTI
    S->>R: ZREMRANGEBYSCORE recycled_nonces_zset (prune)
    S->>R: HGET submitted_nonces_... (highest_submitted_nonce)
    S->>R: SET optimistic_tx_count_key highest_submitted_nonce + 1
    S->>R: EXEC
    R-->>S: OK
    S-->>C: pruned set, updated count
  else conflict
    R-->>S: EXEC aborted
    S-->>C: retry/backoff
  end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly describes the primary intent — lowering the optimistic nonce when recycled transactions fail — and aligns with the changes that add optimistic transaction count handling and recycled-nonce cleanup; it is concise, specific, and clear enough for a reviewer scanning history to understand the main change.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pb/optimistic-nonce-desync-fix

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

executors/src/eoa/worker/send.rs (1)
21-27: Prevent underflow in balance staleness check

If health.balance_fetched_at is ahead of now (clock skew across workers, persisted future timestamp), now - fetched_at underflows on u64 (debug panic / release wrap), causing pathological refresh behavior. Use saturating_sub.

Apply:
-        let now = EoaExecutorStore::now();
+        let now = EoaExecutorStore::now();
         // Update balance if it's stale
         // TODO: refactor this, very ugly
         if health.balance <= health.balance_threshold {
-            if now - health.balance_fetched_at > HEALTH_CHECK_INTERVAL_MS {
+            if now.saturating_sub(health.balance_fetched_at) > HEALTH_CHECK_INTERVAL_MS {

🧹 Nitpick comments (5)

executors/src/eoa/worker/send.rs (1)

229-236: Rename typo: clean_prepration_results → clean_preparation_results

Spelling nit leaks into call sites; fix for clarity/searchability.

Apply:

-    async fn clean_prepration_results(
+    async fn clean_preparation_results(

And update callers:

-            let cleaned_results = self
-                .clean_prepration_results(prepared_results_with_pending, false)
+            let cleaned_results = self
+                .clean_preparation_results(prepared_results_with_pending, false)
                 .await?;

-            let cleaned_results = self
-                .clean_prepration_results(prepared_results_with_pending, true)
+            let cleaned_results = self
+                .clean_preparation_results(prepared_results_with_pending, true)
                 .await?;

Also applies to: 124-126, 347-349

executors/src/eoa/store/borrowed.rs (2)

106-106: Unify time source

Use the centralized clock helper for consistency and testability.

Apply:

-        let now = chrono::Utc::now().timestamp_millis().max(0) as u64;
+        let now = EoaExecutorStore::now();

77-87: Avoid O(n·m) membership checks: use HashSet

borrowed_transaction_ids.contains(..) on a Vec is linear per lookup. Convert to HashSet<String> to make validation O(n).

Apply:

+use std::collections::HashSet;
@@
-        let borrowed_transaction_ids: Vec<String> = conn
+        let borrowed_transaction_ids: HashSet<String> = conn
             .hkeys(self.keys.borrowed_transactions_hashmap_name())
             .await?;
@@
-            if borrowed_transaction_ids.contains(&transaction_id.to_string()) {
+            if borrowed_transaction_ids.contains(transaction_id) {
                 valid_results.push(result.clone());
             } else {

executors/src/eoa/store/submitted.rs (2)

622-626: Guard against optimistic count regression

Setting the optimistic count to highest_submitted_nonce + 1 can move it backwards if another path already advanced it. Persist the max of the current value and the computed value.

Minimal change using validation to read current value:

-impl SafeRedisTransaction for CleanAndGetRecycledNonces<'_> {
-    type ValidationData = (u64, Vec<u64>);
+impl SafeRedisTransaction for CleanAndGetRecycledNonces<'_> {
+    type ValidationData = (u64, Vec<u64>, Option<u64>);
@@
     async fn validation(
@@
-        // get the highest submitted nonce
+        // get the highest submitted nonce
         let highest_submitted: Vec<SubmittedTransactionStringWithNonce> = conn
             .zrange_withscores(self.keys.submitted_transactions_zset_name(), -1, -1)
             .await?;
@@
-        Ok((highest_submitted_nonce, recycled_nonces))
+        // read current optimistic count to ensure monotonicity
+        let current_opt: Option<u64> = conn
+            .get(self.keys.optimistic_transaction_count_key_name())
+            .await?;
+        Ok((highest_submitted_nonce, recycled_nonces, current_opt))
@@
-    fn operation(
+    fn operation(
         &self,
         pipeline: &mut Pipeline,
-        (highest_submitted_nonce, recycled_nonces): Self::ValidationData,
+        (highest_submitted_nonce, recycled_nonces, current_opt): Self::ValidationData,
     ) -> Self::OperationResult {
@@
-        pipeline.set(
-            self.keys.optimistic_transaction_count_key_name(),
-            highest_submitted_nonce + 1,
-        );
+        let next = (highest_submitted_nonce + 1).max(current_opt.unwrap_or(0));
+        pipeline.set(self.keys.optimistic_transaction_count_key_name(), next);

308-309: Unify time source

Prefer the shared helper for timestamps here too.

Apply:

-        let now = chrono::Utc::now().timestamp_millis().max(0) as u64;
+        let now = EoaExecutorStore::now();

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ac2ed37 and bf3506d.

📒 Files selected for processing (3)

executors/src/eoa/store/borrowed.rs (4 hunks)
executors/src/eoa/store/submitted.rs (2 hunks)
executors/src/eoa/worker/send.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: d4mr
PR: thirdweb-dev/engine-core#5
File: executors/src/eoa/worker.rs:173-176
Timestamp: 2025-07-06T15:44:13.701Z
Learning: The EOA executor store uses comprehensive WATCH-based coordination where every Redis state mutation watches the lock key and validates ownership before proceeding. If the lock is lost during any operation, the transaction fails with LockLost error. This makes aggressive lock acquisition safe because only the actual lock holder can successfully perform state mutations, regardless of who claims the lock.

Learnt from: d4mr
PR: thirdweb-dev/engine-core#5
File: executors/src/eoa/worker.rs:173-176
Timestamp: 2025-07-06T15:44:13.701Z
Learning: In the EOA executor system, aggressive lock acquisition is safe because every Redis state mutation uses WATCH operations on the lock key. If the lock is lost during a transaction, the WATCH causes the transaction to fail and the worker exits gracefully. This provides coordination between workers even when using forceful lock takeover.

📚 Learning: 2025-09-20T05:30:35.171Z

Learnt from: joaquim-verges
PR: thirdweb-dev/engine-core#48
File: executors/src/eoa/worker/send.rs:20-21
Timestamp: 2025-09-20T05:30:35.171Z
Learning: In executors/src/eoa/worker/send.rs, there is a critical bug where HEALTH_CHECK_INTERVAL is defined as 300 seconds but compared against millisecond timestamps, causing balance checks to occur every 300ms instead of every 5 minutes (1000x more frequent than intended).

Applied to files:

executors/src/eoa/worker/send.rs
executors/src/eoa/store/borrowed.rs

📚 Learning: 2025-09-20T06:58:40.230Z

Learnt from: d4mr
PR: thirdweb-dev/engine-core#48
File: executors/src/eoa/store/submitted.rs:229-230
Timestamp: 2025-09-20T06:58:40.230Z
Learning: The diff in executors/src/eoa/store/submitted.rs shows correct brace structure - the first closing brace closes the remove_transaction_from_redis_submitted_zset method and the second closing brace closes the impl CleanSubmittedTransactions block. The change only adds whitespace formatting.

Applied to files:

executors/src/eoa/store/submitted.rs
executors/src/eoa/store/borrowed.rs

📚 Learning: 2025-07-06T15:44:13.701Z

Learnt from: d4mr
PR: thirdweb-dev/engine-core#5
File: executors/src/eoa/worker.rs:173-176
Timestamp: 2025-07-06T15:44:13.701Z
Learning: The EOA executor store uses comprehensive WATCH-based coordination where every Redis state mutation watches the lock key and validates ownership before proceeding. If the lock is lost during any operation, the transaction fails with LockLost error. This makes aggressive lock acquisition safe because only the actual lock holder can successfully perform state mutations, regardless of who claims the lock.

Applied to files:

executors/src/eoa/store/submitted.rs

🧬 Code graph analysis (3)

executors/src/eoa/worker/send.rs (2)

executors/src/eoa/worker/error.rs (2)

is_retryable_preparation_error (250-281)

should_update_balance_threshold (215-235)

executors/src/eoa/store/mod.rs (1)

now (805-807)

executors/src/eoa/store/submitted.rs (1)

executors/src/eoa/store/mod.rs (1)

optimistic_transaction_count_key_name (235-246)

executors/src/eoa/store/borrowed.rs (2)

executors/src/metrics.rs (2)

calculate_duration_seconds (225-227)

current_timestamp_ms (236-238)

executors/src/eoa/store/mod.rs (2)

borrowed_transactions_hashmap_name (196-204)

recycled_nonces_zset_name (213-224)

🔇 Additional comments (5)

executors/src/eoa/worker/send.rs (2)

8-11: Imports reorg + SendContext in scope — OK

No issues spotted with the import reshuffle; symbols remain correct.

317-336: Confirm optimistic nonce is only ever advanced — review findings and action needed

Verified: Moves from pending→borrowed with incremented nonces are validated against the current optimistic nonce and set optimistic = highest_nonce + 1 (MovePendingToBorrowedWithIncrementedNonces) — monotonic increase guaranteed by validation.

Verified: update_cached_transaction_count only sets optimistic to current_chain_tx_count when current_chain_tx_count > optimistic or when optimistic is uninitialized — monotonic/non-decreasing.

Action required: submitted.rs unconditionally sets optimistic = highest_submitted_nonce + 1 in its operation. I couldn't confirm from the current snippet whether highest_submitted_nonce is always >= the current optimistic value; please confirm that highest_submitted_nonce is computed as the max of submitted nonces and the current optimistic (or otherwise ensure the pipeline only sets optimistic if new_value > current). If it can be lower, change to a conditional set (pipeline.set only when highest_submitted_nonce + 1 > current optimistic) or compute new_optimistic = max(current_optimistic, highest_submitted_nonce + 1) before setting.

Location to check: executors/src/eoa/store/submitted.rs — the operation block that calls pipeline.set(self.keys.optimistic_transaction_count_key_name(), highest_submitted_nonce + 1).

executors/src/eoa/store/borrowed.rs (2)

64-69: Watch recycled nonces alongside borrowed — good expansion

Including recycled_nonces_zset in watch_keys() matches the new recycling semantics.

Please confirm the lock key is also WATCHed by the transaction wrapper (per prior design), and that mutating pending_transactions_zset in this op doesn’t require adding it to watch_keys() for your invariants.

194-196: Marking nonces for recycle on Nack/Fail — OK

Correctly reintroduces nonces that weren’t consumed.

Also applies to: 229-231

executors/src/eoa/store/submitted.rs (1)

559-566: Watching optimistic count with recycled/submitted — OK

The expanded watch set aligns with the optimistic-nonce maintenance.

bring down optimistic nonce in case recycled transactions fail

bf3506d

d4mr merged commit 08bd0e1 into main Sep 23, 2025
2 of 3 checks passed

d4mr deleted the pb/optimistic-nonce-desync-fix branch September 23, 2025 04:26

joaquim-verges approved these changes Sep 23, 2025

View reviewed changes

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bring down optimistic nonce in case recycled transactions fail #52

bring down optimistic nonce in case recycled transactions fail #52

Uh oh!

d4mr commented Sep 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bring down optimistic nonce in case recycled transactions fail #52

bring down optimistic nonce in case recycled transactions fail #52

Uh oh!

Conversation

d4mr commented Sep 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d4mr commented Sep 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 23, 2025 •

edited

Loading