Skip to content

Conversation

@dimriou
Copy link
Contributor

@dimriou dimriou commented Nov 7, 2025

Each commit corresponds to the following features:

  • OEV-676 Sync broadcasting and backfilling
  • OEV-680 Trigger instant retransmission of purgeable transactions
  • OEV-675 Reduce empty transaction gas limit
  • OEV-677 Add Meta Error Handler
  • OEV-668 Add agnostic gas bumping
  • OEV-679 Add separate pricing for empty transactions

Many of the TODOs of this PR are fixed in:

Update: GetMaxFee PR merged into this

@dimriou dimriou force-pushed the oev-671_txm_improvements branch from 8276ab6 to 64f4d2f Compare November 10, 2025 10:51
go t.broadcastLoop(address, triggerCh)
go t.backfillLoop(address)
t.wg.Add(1)
go t.loop(address, triggerCh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remind me what the advantage was of combining these two loops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It enables easier nonce management via error handling when broadcasting a transaction.

return err
}

if tx == nil || *tx.Nonce != latestNonce {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this scenario happen? hard for me to think of why the local nonce would be behind the chain nonce if it has to originate from the local txm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not behind but in front. Here's an example: you broadcast tx A to the mempool. You restart the TXM, fetch the pending nonce, which is A+1, and start sending more transactions to the mempool. Since you restarted the TXM, transaction A is not being tracked. If it gets dropped by the RPC or the builders, TXM won't be able to recover unless it fills the nonce. That's what this case does.

Copy link

@lhmerino lhmerino Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed TXM uses pending for the block parameter to capture pending transactions.
Does the Flashbots client behave accordingly (i.e., takes into account inflight transactions)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does.

// TODO: add better handling
bumpedAttempt, err := a.NewBumpAttempt(ctx, lggr, tx, *attempt)
if err != nil {
return attempt, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we select for that specific error so we reduce possibility of halting on a different error?

Copy link
Contributor

@augustbleeds augustbleeds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me, just want to clarify the error handling behavior for empty tx gas bumping.

@dimriou dimriou force-pushed the oev-671_txm_improvements branch from 29a1813 to 115e871 Compare November 11, 2025 15:51
}

func (e *errorHandler) HandleError(ctx context.Context, tx *types.Transaction, txErr error, txStore txm.TxStore, setNonce func(common.Address, uint64), isFromBroadcastMethod bool) error {
// If this isn't the first broadcast, don't mark the tx as fatal as other txs might be included on-chain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If this isn't the first broadcast, don't mark the tx as fatal as other txs might be included on-chain.
// Only if this is the first broadcast, mark the tx as fatal. In other cases, other txs might be included on-chain


// bump purge attempts
if tx.IsPurgeable {
// TODO: add better handling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to make a separate ticket with an explanation of what would be better here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR that permanently handles this is here:

for {
bumpedAttempt, err := a.NewBumpAttempt(ctx, lggr, tx, *attempt)
if err != nil {
if errors.Is(err, fees.ErrConnectivity) {
Copy link

@lhmerino lhmerino Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the error choice. I believe this is due to the code located here

if bumpedMaxPriorityFeePerGas.Cmp(priorityFeeThreshold) > 0 {
    return ..., ErrConnectivity
}

It feels slightly counterintuitive to return ErrConnectivity for a value check, where ErrBumpFeeExceedsLimit might seem more natural.

Is the intention to treat priorityFeeThreshold as a sanity guardrail (implying the network/estimator is broken if fees get this high) rather than a budget limit (implying the tx just needs to pay more)? If so, adding a small comment explaining why this triggers a 'connectivity' error would be helpful for readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality is based on TXMv1. Both FeeHistory and BlockHistory estimators use the 90th percentile as a maximum threshold to prevent indefinite gas bumping.
The above was used in a hacky way to access that percentile without changing multiple interfaces in the gas package during testing. We'll have to update the PR and change this before it gets merged.

pkg/txm/txm.go Outdated
defer cancel()
broadcastWithBackoff := newBackoff(1 * time.Second)
var broadcastCh <-chan time.Time
backfillCh := time.After(utils.WithJitter(t.config.BlockTime))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utils.WithJitter is a deprecated function.
It says to use timeutils package instead.

metrics, err := NewMetaMetrics(chainID.String())
if err != nil {
return nil, fmt.Errorf("failed to create Meta metrics: %w", err)
}
Copy link

@lhmerino lhmerino Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the MetaClient currently active -- it looks A-specific?
I noticed a potential dependency in the metrics setup. If beholder is unavailable, will this cause the client to error out as well?

Ideally, we want to ensure the client is decoupled so that a beholder outage doesn't block the main flow. For example, on 10/10 I saw many alerts about beholder potentially having issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both prom and beholder packages are safe and if they're not available a default object will be created with NOOPs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors you saw are due to Beholder package not properly exlcuding the bootstrap node from the metrics and sending repeated error messages. This is Beholder-specific.

return NewFlashbotsClient(client, keyStore, url), nil
return NewFlashbotsClient(client, keyStore, url), nil, nil
default:
return NewMetaClient(lggr, client, keyStore, url, chainID)
Copy link

@lhmerino lhmerino Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code could then be made cleaner if NewMetaClient does not fail.


// TODO: for now do the simple thing and drop the transaction instead of adding it to the fatal queue.
delete(m.UnconfirmedTransactions, *txToMark.Nonce)
delete(m.Transactions, txToMark.ID)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few safety concerns about deleting these entries:

  • Nonce Gaps: Could deleting this nonce create a gap in our internal inflight list? I'm concerned that higher-nonce transactions currently inflight might stall indefinitely waiting for the tx associated with this nonce to fill.
  • Panic Risk: Dereferencing *txToMark.Nonce without checking if txToMark.Nonce is not nil can cause a panic.
  • Less important but still relevant: could this complicate debugging by deleting this information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Nonce Gaps: it's in the TXM's responsibility to call MarkTxFatal in the correct place so it doesn't create a nonce gap, the same way it does for other methods as well. In this case, we call MarkTxFata inside the Meta error handler and only if there are no past broadcasts. Finally, if all the above measures for some reason fail, the TXM has a native nonce gap detection mechanism that will fill any untracked nonce gaps and unblock txs with higher nonces.
  • Panic Risk: we check for a nil nonce value a few lines before here.
  • Debugging: the fatal transactions are not currently being utilized in some sort of way. We do log the information of this marking here, but for now there is no effort to do something with the fatal queue, hence the TODO.

FromAddress: m.address,
ToAddress: common.Address{},
Value: big.NewInt(0),
SpecifiedGasLimit: gasLimit,
Copy link

@lhmerino lhmerino Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: We could simplify this by hardcoding the gas limit to 21_000.

Since a standard ETH transfer to an EOA (0x0) has a fixed cost (21_000), passing in gasLimit and maintaining a separate EmptyTxLimitDefault config seems like unnecessary overhead. In addition, it would also remove the need to add gasEstimatorConfig parameter to func NewTxmV2.

If we were to instead send ETH to/from a smart contract, only then would the gas amount differ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although standard ETH transfer uses 21k we've observed other EVM-based chains using slightly different values. Since there is already a Core config that we can leverage, it's better to future-proof the design than to require a new custom release if something needs to be adjusted.

@dimriou dimriou marked this pull request as ready for review November 26, 2025 15:06
@dimriou dimriou requested a review from a team as a code owner November 26, 2025 15:06
Copilot AI review requested due to automatic review settings November 26, 2025 15:06
@dimriou dimriou requested review from a team as code owners November 26, 2025 15:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements several TXMv2 improvements including synchronized broadcasting and backfilling, instant retransmission of purgeable transactions, reduced gas limits for empty transactions, a Meta error handler, agnostic gas bumping, and separate pricing for empty transactions.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pkg/txmgr/builder.go Adds gasEstimatorConfig parameter, temporary validation check, and error handler support to NewTxmV2
pkg/txm/txm_test.go Changes package to txm_test, updates test function names to use public methods, adds comprehensive flow tests for retransmission and error handling
pkg/txm/txm.go Merges broadcast and backfill loops into single loop, adds NewAgnosticBumpAttempt interface, updates ErrorHandler signature, exports Metrics and nonce methods
pkg/txm/storage/inmemory_store_test.go Adds test for MarkTxFatal functionality
pkg/txm/storage/inmemory_store.go Implements MarkTxFatal to remove fatal transactions from store
pkg/txm/mock_tx_store_test.go Renames mock types from lowercase to uppercase for consistent exported names
pkg/txm/mock_client_test.go Renames mock types from lowercase to uppercase for consistent exported names
pkg/txm/mock_attempt_builder_test.go Removes NewBumpAttempt mock, renames to NewAgnosticBumpAttempt, updates mock naming
pkg/txm/clientwrappers/dualbroadcast/selector.go Updates SelectClient to return error handler alongside client
pkg/txm/clientwrappers/dualbroadcast/meta_error_handler_test.go Adds tests for Meta error handler behavior with no bids errors
pkg/txm/clientwrappers/dualbroadcast/meta_error_handler.go Implements error handler to mark transactions fatal on first attempt with no bids error
pkg/txm/clientwrappers/dualbroadcast/meta_client.go Returns error for no bids, adds error constants, fixes logger name, improves logging
pkg/txm/attempt_builder_test.go Adds comprehensive tests for NewAttempt and NewAgnosticBumpAttempt methods
pkg/txm/attempt_builder.go Implements NewAgnosticBumpAttempt with configurable bumping logic and purgeable transaction handling
pkg/chains/legacyevm/evm_txm.go Passes gasEstimatorConfig to NewTxmV2
pkg/.mockery.yaml Updates mock naming convention to use uppercase prefix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 206 to +208
t.nonceMapMu.Lock()
t.nonceMap[address] = nonce
defer t.nonceMapMu.Unlock()
t.nonceMap[address] = nonce
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defer statement for unlocking the mutex is placed before the actual write operation, which could cause the lock to be released before the write completes in certain execution scenarios. Move the defer statement to immediately follow the Lock() call on line 206.

Copilot uses AI. Check for mistakes.
}

if tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) {
if tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) || tx.IsPurgeable {
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complex conditional combining time calculations and boolean checks reduces readability. Consider extracting the time-based retry condition into a named variable like shouldRetryBasedOnTime := tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) to improve clarity.

Suggested change
if tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) || tx.IsPurgeable {
shouldRetryBasedOnTime := tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold))
if shouldRetryBasedOnTime || tx.IsPurgeable {

Copilot uses AI. Check for mistakes.

func (e *errorHandler) HandleError(ctx context.Context, tx *types.Transaction, txErr error, txStore txm.TxStore, setNonce func(common.Address, uint64), isFromBroadcastMethod bool) error {
// Mark the tx as fatal only if this is the first broadcast. In any other case, other txs might be included on-chain.
if strings.Contains(txErr.Error(), NoBidsError) && tx.AttemptCount == 1 {
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using string comparison to detect error types is fragile and error-prone. If the error message changes or wraps additional context, this check may fail. Consider using errors.Is() with a sentinel error or implement a custom error type for NoBidsError.

Copilot uses AI. Check for mistakes.
Value: big.NewInt(0),
SpecifiedGasLimit: gasLimit,
CreatedAt: time.Now(),
State: txmgr.TxUnconfirmed,
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty transactions are automatically marked as purgeable without clear documentation explaining this design decision. Add a comment explaining why empty transactions should always be purgeable to help future maintainers understand the intent.

Suggested change
State: txmgr.TxUnconfirmed,
State: txmgr.TxUnconfirmed,
// Empty transactions are always marked as purgeable because they do not represent real on-chain activity.
// This allows the system to safely remove them to free up resources or maintain nonce continuity.

Copilot uses AI. Check for mistakes.
Comment on lines 76 to 85
for {
bumpedAttempt, err := a.NewBumpAttempt(ctx, lggr, tx, *attempt)
if err != nil {
if errors.Is(err, fees.ErrConnectivity) {
return attempt, nil
}
return nil, fmt.Errorf("error bumping attempt for txID: %v, err: %w", tx.ID, err)
}
attempt = bumpedAttempt
}
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The infinite loop for purgeable attempts has no bounds or circuit breaker. If fees.ErrConnectivity is never encountered, this will loop indefinitely, potentially causing performance issues or resource exhaustion. Add a maximum iteration count or timeout mechanism.

Copilot uses AI. Check for mistakes.
GasLimit: 22000,
}
ab.On("NewAttempt", mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(attempt, nil).Once()
ab.On("NewAgnosticBumpAttempt", mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(attempt, nil).Once()
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using mock.Anything for all parameters reduces the value of the test by not verifying the actual arguments passed. Consider using more specific matchers to validate that the correct transaction, context, and logger are being passed to NewAgnosticBumpAttempt.

Suggested change
ab.On("NewAgnosticBumpAttempt", mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return(attempt, nil).Once()
ab.On(
"NewAgnosticBumpAttempt",
mock.MatchedBy(func(txArg *types.Tx) bool { return txArg != nil && txArg.ID == tx.ID }),
mock.Anything, // context.Context, hard to match exactly
mock.MatchedBy(func(l logger.Logger) bool { return l == lggr }),
mock.Anything, // EvmFeeEstimator
).Return(attempt, nil).Once()

Copilot uses AI. Check for mistakes.
Comment on lines +155 to +157
tx := &types.Transaction{ID: 10, FromAddress: address}
mockEstimator.On("GetFee", mock.Anything, mock.Anything, mock.Anything, mock.Anything, mock.Anything, mock.Anything).
Return(gas.EvmFee{}, uint64(0), errors.New("estimator error")).Once()
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test verifies that GetFee errors are propagated but doesn't test NewBumpAttempt error scenarios. Add test cases to verify error handling when NewBumpAttempt fails during the bumping process in NewAgnosticBumpAttempt.

Copilot uses AI. Check for mistakes.
attemptBuilder := txm.NewAttemptBuilder(fCfg.PriceMaxKey, estimator, keyStore)
// TODO: temporary check until we implement the required methods on the estimator interface
if gasEstimatorConfig.Mode() != "BlockHistory" || gasEstimatorConfig.BlockHistory().CheckInclusionBlocks() == 0 {
return nil, errors.New("only BlockHistory mode with CheckInclusionBlocks > 0 is supported for TXMv2")
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message doesn't provide actionable guidance on how to fix the configuration. Consider enhancing it to: return nil, fmt.Errorf(\"TXMv2 requires GasEstimator Mode to be 'BlockHistory' with CheckInclusionBlocks > 0, got Mode=%s, CheckInclusionBlocks=%d\", gasEstimatorConfig.Mode(), gasEstimatorConfig.BlockHistory().CheckInclusionBlocks())

Suggested change
return nil, errors.New("only BlockHistory mode with CheckInclusionBlocks > 0 is supported for TXMv2")
return nil, fmt.Errorf("TXMv2 requires GasEstimator Mode to be 'BlockHistory' with CheckInclusionBlocks > 0, got Mode=%s, CheckInclusionBlocks=%d", gasEstimatorConfig.Mode(), gasEstimatorConfig.BlockHistory().CheckInclusionBlocks())

Copilot uses AI. Check for mistakes.
logPoller,
opts.KeyStore,
estimator,
cfg.GasEstimator(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to pass the GasEstimator separately, isn't it already passed in cfg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire cfg is a superset of all configs, but the expected param interfaces are a small subset of it. Historically, we did that to split the responsibilities of configs and not pass one large EVM config, so I replicated the same behaviour here, but if you ask me it's not a great design choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining, let's keep it as is.

timeout = time.Second * 5
metaABI = `[
timeout = time.Second * 5
NoBidsError = "no bids"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not errors.New("no bids") ? this way you can use errors.Is for easy error testing.

if tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) {
if tx.LastBroadcastAt == nil || time.Since(*tx.LastBroadcastAt) > (t.config.BlockTime*time.Duration(t.config.RetryBlockThreshold)) || tx.IsPurgeable {
// TODO: add optional graceful bumping strategy
t.lggr.Info("Rebroadcasting attempt for txID: ", tx.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add more information here to show why rebroadcasting happened: it seems that there are 3 conditions above and it would be useful to understand which triggered.

@dimriou dimriou force-pushed the oev-671_txm_improvements branch from aa97fa9 to c8b5fd4 Compare November 27, 2025 11:59
* Implement GetMaxFee method

* Update empty transaction fee building
@dimriou dimriou enabled auto-merge (squash) December 1, 2025 17:09
@dimriou dimriou merged commit af04e91 into develop Dec 1, 2025
33 checks passed
@dimriou dimriou deleted the oev-671_txm_improvements branch December 1, 2025 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants