-
Notifications
You must be signed in to change notification settings - Fork 15
Oev 671 txm improvements #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8276ab6 to
64f4d2f
Compare
| go t.broadcastLoop(address, triggerCh) | ||
| go t.backfillLoop(address) | ||
| t.wg.Add(1) | ||
| go t.loop(address, triggerCh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remind me what the advantage was of combining these two loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It enables easier nonce management via error handling when broadcasting a transaction.
| return err | ||
| } | ||
|
|
||
| if tx == nil || *tx.Nonce != latestNonce { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when does this scenario happen? hard for me to think of why the local nonce would be behind the chain nonce if it has to originate from the local txm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not behind but in front. Here's an example: you broadcast tx A to the mempool. You restart the TXM, fetch the pending nonce, which is A+1, and start sending more transactions to the mempool. Since you restarted the TXM, transaction A is not being tracked. If it gets dropped by the RPC or the builders, TXM won't be able to recover unless it fills the nonce. That's what this case does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed TXM uses pending for the block parameter to capture pending transactions.
Does the Flashbots client behave accordingly (i.e., takes into account inflight transactions)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does.
pkg/txm/attempt_builder.go
Outdated
| // TODO: add better handling | ||
| bumpedAttempt, err := a.NewBumpAttempt(ctx, lggr, tx, *attempt) | ||
| if err != nil { | ||
| return attempt, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we select for that specific error so we reduce possibility of halting on a different error?
augustbleeds
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, just want to clarify the error handling behavior for empty tx gas bumping.
29a1813 to
115e871
Compare
| } | ||
|
|
||
| func (e *errorHandler) HandleError(ctx context.Context, tx *types.Transaction, txErr error, txStore txm.TxStore, setNonce func(common.Address, uint64), isFromBroadcastMethod bool) error { | ||
| // If this isn't the first broadcast, don't mark the tx as fatal as other txs might be included on-chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // If this isn't the first broadcast, don't mark the tx as fatal as other txs might be included on-chain. | |
| // Only if this is the first broadcast, mark the tx as fatal. In other cases, other txs might be included on-chain |
pkg/txm/attempt_builder.go
Outdated
|
|
||
| // bump purge attempts | ||
| if tx.IsPurgeable { | ||
| // TODO: add better handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to make a separate ticket with an explanation of what would be better here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR that permanently handles this is here:
pkg/txm/attempt_builder.go
Outdated
| for { | ||
| bumpedAttempt, err := a.NewBumpAttempt(ctx, lggr, tx, *attempt) | ||
| if err != nil { | ||
| if errors.Is(err, fees.ErrConnectivity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at the error choice. I believe this is due to the code located here
if bumpedMaxPriorityFeePerGas.Cmp(priorityFeeThreshold) > 0 {
return ..., ErrConnectivity
}
It feels slightly counterintuitive to return ErrConnectivity for a value check, where ErrBumpFeeExceedsLimit might seem more natural.
Is the intention to treat priorityFeeThreshold as a sanity guardrail (implying the network/estimator is broken if fees get this high) rather than a budget limit (implying the tx just needs to pay more)? If so, adding a small comment explaining why this triggers a 'connectivity' error would be helpful for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality is based on TXMv1. Both FeeHistory and BlockHistory estimators use the 90th percentile as a maximum threshold to prevent indefinite gas bumping.
The above was used in a hacky way to access that percentile without changing multiple interfaces in the gas package during testing. We'll have to update the PR and change this before it gets merged.
pkg/txm/txm.go
Outdated
| defer cancel() | ||
| broadcastWithBackoff := newBackoff(1 * time.Second) | ||
| var broadcastCh <-chan time.Time | ||
| backfillCh := time.After(utils.WithJitter(t.config.BlockTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utils.WithJitter is a deprecated function.
It says to use timeutils package instead.
| metrics, err := NewMetaMetrics(chainID.String()) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create Meta metrics: %w", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the MetaClient currently active -- it looks A-specific?
I noticed a potential dependency in the metrics setup. If beholder is unavailable, will this cause the client to error out as well?
Ideally, we want to ensure the client is decoupled so that a beholder outage doesn't block the main flow. For example, on 10/10 I saw many alerts about beholder potentially having issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both prom and beholder packages are safe and if they're not available a default object will be created with NOOPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The errors you saw are due to Beholder package not properly exlcuding the bootstrap node from the metrics and sending repeated error messages. This is Beholder-specific.
| return NewFlashbotsClient(client, keyStore, url), nil | ||
| return NewFlashbotsClient(client, keyStore, url), nil, nil | ||
| default: | ||
| return NewMetaClient(lggr, client, keyStore, url, chainID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code could then be made cleaner if NewMetaClient does not fail.
|
|
||
| // TODO: for now do the simple thing and drop the transaction instead of adding it to the fatal queue. | ||
| delete(m.UnconfirmedTransactions, *txToMark.Nonce) | ||
| delete(m.Transactions, txToMark.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few safety concerns about deleting these entries:
- Nonce Gaps: Could deleting this nonce create a gap in our internal inflight list? I'm concerned that higher-nonce transactions currently inflight might stall indefinitely waiting for the tx associated with this nonce to fill.
- Panic Risk: Dereferencing *txToMark.Nonce without checking if txToMark.Nonce is not nil can cause a panic.
- Less important but still relevant: could this complicate debugging by deleting this information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Nonce Gaps: it's in the TXM's responsibility to call
MarkTxFatalin the correct place so it doesn't create a nonce gap, the same way it does for other methods as well. In this case, we callMarkTxFatainside the Meta error handler and only if there are no past broadcasts. Finally, if all the above measures for some reason fail, the TXM has a native nonce gap detection mechanism that will fill any untracked nonce gaps and unblock txs with higher nonces. - Panic Risk: we check for a nil nonce value a few lines before here.
- Debugging: the fatal transactions are not currently being utilized in some sort of way. We do log the information of this marking here, but for now there is no effort to do something with the fatal queue, hence the TODO.
| FromAddress: m.address, | ||
| ToAddress: common.Address{}, | ||
| Value: big.NewInt(0), | ||
| SpecifiedGasLimit: gasLimit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: We could simplify this by hardcoding the gas limit to 21_000.
Since a standard ETH transfer to an EOA (0x0) has a fixed cost (21_000), passing in gasLimit and maintaining a separate EmptyTxLimitDefault config seems like unnecessary overhead. In addition, it would also remove the need to add gasEstimatorConfig parameter to func NewTxmV2.
If we were to instead send ETH to/from a smart contract, only then would the gas amount differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although standard ETH transfer uses 21k we've observed other EVM-based chains using slightly different values. Since there is already a Core config that we can leverage, it's better to future-proof the design than to require a new custom release if something needs to be adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements several TXMv2 improvements including synchronized broadcasting and backfilling, instant retransmission of purgeable transactions, reduced gas limits for empty transactions, a Meta error handler, agnostic gas bumping, and separate pricing for empty transactions.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/txmgr/builder.go | Adds gasEstimatorConfig parameter, temporary validation check, and error handler support to NewTxmV2 |
| pkg/txm/txm_test.go | Changes package to txm_test, updates test function names to use public methods, adds comprehensive flow tests for retransmission and error handling |
| pkg/txm/txm.go | Merges broadcast and backfill loops into single loop, adds NewAgnosticBumpAttempt interface, updates ErrorHandler signature, exports Metrics and nonce methods |
| pkg/txm/storage/inmemory_store_test.go | Adds test for MarkTxFatal functionality |
| pkg/txm/storage/inmemory_store.go | Implements MarkTxFatal to remove fatal transactions from store |
| pkg/txm/mock_tx_store_test.go | Renames mock types from lowercase to uppercase for consistent exported names |
| pkg/txm/mock_client_test.go | Renames mock types from lowercase to uppercase for consistent exported names |
| pkg/txm/mock_attempt_builder_test.go | Removes NewBumpAttempt mock, renames to NewAgnosticBumpAttempt, updates mock naming |
| pkg/txm/clientwrappers/dualbroadcast/selector.go | Updates SelectClient to return error handler alongside client |
| pkg/txm/clientwrappers/dualbroadcast/meta_error_handler_test.go | Adds tests for Meta error handler behavior with no bids errors |
| pkg/txm/clientwrappers/dualbroadcast/meta_error_handler.go | Implements error handler to mark transactions fatal on first attempt with no bids error |
| pkg/txm/clientwrappers/dualbroadcast/meta_client.go | Returns error for no bids, adds error constants, fixes logger name, improves logging |
| pkg/txm/attempt_builder_test.go | Adds comprehensive tests for NewAttempt and NewAgnosticBumpAttempt methods |
| pkg/txm/attempt_builder.go | Implements NewAgnosticBumpAttempt with configurable bumping logic and purgeable transaction handling |
| pkg/chains/legacyevm/evm_txm.go | Passes gasEstimatorConfig to NewTxmV2 |
| pkg/.mockery.yaml | Updates mock naming convention to use uppercase prefix |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| t.nonceMapMu.Lock() | ||
| t.nonceMap[address] = nonce | ||
| defer t.nonceMapMu.Unlock() | ||
| t.nonceMap[address] = nonce |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defer statement for unlocking the mutex is placed before the actual write operation, which could cause the lock to be released before the write completes in certain execution scenarios. Move the defer statement to immediately follow the Lock() call on line 206.
| } | ||
|
|
||
| if tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) { | ||
| if tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) || tx.IsPurgeable { |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complex conditional combining time calculations and boolean checks reduces readability. Consider extracting the time-based retry condition into a named variable like shouldRetryBasedOnTime := tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) to improve clarity.
| if tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) || tx.IsPurgeable { | |
| shouldRetryBasedOnTime := tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) | |
| if shouldRetryBasedOnTime || tx.IsPurgeable { |
|
|
||
| func (e *errorHandler) HandleError(ctx context.Context, tx *types.Transaction, txErr error, txStore txm.TxStore, setNonce func(common.Address, uint64), isFromBroadcastMethod bool) error { | ||
| // Mark the tx as fatal only if this is the first broadcast. In any other case, other txs might be included on-chain. | ||
| if strings.Contains(txErr.Error(), NoBidsError) && tx.AttemptCount == 1 { |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using string comparison to detect error types is fragile and error-prone. If the error message changes or wraps additional context, this check may fail. Consider using errors.Is() with a sentinel error or implement a custom error type for NoBidsError.
| Value: big.NewInt(0), | ||
| SpecifiedGasLimit: gasLimit, | ||
| CreatedAt: time.Now(), | ||
| State: txmgr.TxUnconfirmed, |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty transactions are automatically marked as purgeable without clear documentation explaining this design decision. Add a comment explaining why empty transactions should always be purgeable to help future maintainers understand the intent.
| State: txmgr.TxUnconfirmed, | |
| State: txmgr.TxUnconfirmed, | |
| // Empty transactions are always marked as purgeable because they do not represent real on-chain activity. | |
| // This allows the system to safely remove them to free up resources or maintain nonce continuity. |
pkg/txm/attempt_builder.go
Outdated
| for { | ||
| bumpedAttempt, err := a.NewBumpAttempt(ctx, lggr, tx, *attempt) | ||
| if err != nil { | ||
| if errors.Is(err, fees.ErrConnectivity) { | ||
| return attempt, nil | ||
| } | ||
| return nil, fmt.Errorf("error bumping attempt for txID: %v, err: %w", tx.ID, err) | ||
| } | ||
| attempt = bumpedAttempt | ||
| } |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The infinite loop for purgeable attempts has no bounds or circuit breaker. If fees.ErrConnectivity is never encountered, this will loop indefinitely, potentially causing performance issues or resource exhaustion. Add a maximum iteration count or timeout mechanism.
| GasLimit: 22000, | ||
| } | ||
| ab.On("NewAttempt", mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(attempt, nil).Once() | ||
| ab.On("NewAgnosticBumpAttempt", mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(attempt, nil).Once() |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using mock.Anything for all parameters reduces the value of the test by not verifying the actual arguments passed. Consider using more specific matchers to validate that the correct transaction, context, and logger are being passed to NewAgnosticBumpAttempt.
| ab.On("NewAgnosticBumpAttempt", mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(attempt, nil).Once() | |
| ab.On( | |
| "NewAgnosticBumpAttempt", | |
| mock.MatchedBy(func(txArg *types.Tx) bool { return txArg != nil && txArg.ID == tx.ID }), | |
| mock.Anything, // context.Context, hard to match exactly | |
| mock.MatchedBy(func(l logger.Logger) bool { return l == lggr }), | |
| mock.Anything, // EvmFeeEstimator | |
| ).Return(attempt, nil).Once() |
| tx := &types.Transaction{ID: 10, FromAddress: address} | ||
| mockEstimator.On("GetFee", mock.Anything, mock.Anything, mock.Anything, mock.Anything, mock.Anything, mock.Anything). | ||
| Return(gas.EvmFee{}, uint64(0), errors.New("estimator error")).Once() |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test verifies that GetFee errors are propagated but doesn't test NewBumpAttempt error scenarios. Add test cases to verify error handling when NewBumpAttempt fails during the bumping process in NewAgnosticBumpAttempt.
pkg/txmgr/builder.go
Outdated
| attemptBuilder := txm.NewAttemptBuilder(fCfg.PriceMaxKey, estimator, keyStore) | ||
| // TODO: temporary check until we implement the required methods on the estimator interface | ||
| if gasEstimatorConfig.Mode() != "BlockHistory" || gasEstimatorConfig.BlockHistory().CheckInclusionBlocks() == 0 { | ||
| return nil, errors.New("only BlockHistory mode with CheckInclusionBlocks > 0 is supported for TXMv2") |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message doesn't provide actionable guidance on how to fix the configuration. Consider enhancing it to: return nil, fmt.Errorf(\"TXMv2 requires GasEstimator Mode to be 'BlockHistory' with CheckInclusionBlocks > 0, got Mode=%s, CheckInclusionBlocks=%d\", gasEstimatorConfig.Mode(), gasEstimatorConfig.BlockHistory().CheckInclusionBlocks())
| return nil, errors.New("only BlockHistory mode with CheckInclusionBlocks > 0 is supported for TXMv2") | |
| return nil, fmt.Errorf("TXMv2 requires GasEstimator Mode to be 'BlockHistory' with CheckInclusionBlocks > 0, got Mode=%s, CheckInclusionBlocks=%d", gasEstimatorConfig.Mode(), gasEstimatorConfig.BlockHistory().CheckInclusionBlocks()) |
| logPoller, | ||
| opts.KeyStore, | ||
| estimator, | ||
| cfg.GasEstimator(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to pass the GasEstimator separately, isn't it already passed in cfg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire cfg is a superset of all configs, but the expected param interfaces are a small subset of it. Historically, we did that to split the responsibilities of configs and not pass one large EVM config, so I replicated the same behaviour here, but if you ask me it's not a great design choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining, let's keep it as is.
| timeout = time.Second * 5 | ||
| metaABI = `[ | ||
| timeout = time.Second * 5 | ||
| NoBidsError = "no bids" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not errors.New("no bids") ? this way you can use errors.Is for easy error testing.
| if tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) { | ||
| if tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) || tx.IsPurgeable { | ||
| // TODO: add optional graceful bumping strategy | ||
| t.lggr.Info("Rebroadcasting attempt for txID: ", tx.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add more information here to show why rebroadcasting happened: it seems that there are 3 conditions above and it would be useful to understand which triggered.
aa97fa9 to
c8b5fd4
Compare
* Implement GetMaxFee method * Update empty transaction fee building
Each commit corresponds to the following features:
Many of the TODOs of this PR are fixed in:
Specifically for bumping Purgeable attempts, you can see the updated method in the above PR.
Update:
GetMaxFeePR merged into this