Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reformat RPC logs for transaction broadcast failure case for easier debugging in frontend #207

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

dssei
Copy link
Contributor

@dssei dssei commented Mar 7, 2024

Describe your changes and provide context

On sei-js side, when transaction broadcast fails, up until recently we were observing error like the following

BroadcastTxError: Broadcasting transaction failed with code 10 (codespace: sdk). Log:
    at SigningStargateClient.broadcastTx (/Users/bryantran/code/sei/nft/script/node_modules/@cosmjs/stargate/build/stargateclient.js:267:35)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async main (file:///Users/bryantran/code/sei/nft/script/evm/test.js:17:18) {
  code: 10,
  codespace: 'sdk',
  log: ''
}

With recent change #201, the log looks like.

 log: '[{"log":"insufficient fees; got: 1200usei required: 12000usei: insufficient fee"}]'

We would like to tweak formatting a bit (from json, to a simple string) , so the output of the log is more like

log: 'insufficient fees; got: 1200usei required: 12000usei: insufficient fee'

With full error message looking like

BroadcastTxError: Broadcasting transaction failed with code 13 (codespace: sdk). Log: insufficient fees; got: 1200usei required: 12000usei: insufficient fee
    at SigningStargateClient.broadcastTxSync (/Users/denyssinyakov/repos/experimental/cosmos-test/node_modules/@cosmjs/stargate/build/stargateclient.js:290:35)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async SigningStargateClient.broadcastTx (/Users/denyssinyakov/repos/experimental/cosmos-test/node_modules/@cosmjs/stargate/build/stargateclient.js:267:31) {
  code: 13,
  codespace: 'sdk',
  log: 'insufficient fees; got: 1200usei required: 12000usei: insufficient fee'
}


Testing performed to validate your change

  • Unit testing
  • Local node + js client

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.89%. Comparing base (a84cd86) to head (72e28cf).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #207      +/-   ##
==========================================
- Coverage   58.01%   57.89%   -0.12%     
==========================================
  Files         249      249              
  Lines       33916    33915       -1     
==========================================
- Hits        19677    19636      -41     
- Misses      12670    12708      +38     
- Partials     1569     1571       +2     
Files Coverage Δ
internal/mempool/mempool.go 70.05% <100.00%> (-0.06%) ⬇️

... and 16 files with indirect coverage changes

@dssei dssei merged commit 8061a47 into main Mar 11, 2024
22 checks passed
@dssei dssei deleted the SEI-6710_reformat_rpc_logging_for_broadcast_failure branch March 11, 2024 15:09
udpatil pushed a commit that referenced this pull request Mar 27, 2024
* reformat logs to use simple concatenation with separators (#207)

* Use write-lock in (*TxPriorityQueue).ReapMax funcs (#209)

ReapMaxBytesMaxGas and ReapMaxTxs funcs in TxPriorityQueue claim
> Transactions returned are not removed from the mempool transaction
> store or indexes.

However, they use a priority queue to accomplish the claim
> Transaction are retrieved in priority order.

This is accomplished by popping all items out of the whole heap, and
then pushing then back in sequentially. A copy of the heap cannot be
obtained otherwise. Both of the mentioned functions use a read-lock
(RLock) when doing this. This results in a potential scenario where
multiple executions of the ReapMax can be started in parallel, and
both would be popping items out of the priority queue.

In practice, this can be abused by executing the `unconfirmed_txs` RPC
call repeatedly. Based on our observations, running it multiple times
per millisecond results in multiple threads picking it up at the same
time. Such a scenario can be obtained via the WebSocket interface, and
spamming `unconfirmed_txs` calls there. The behavior that happens is a
`Panic in WSJSONRPC handler` when a queue item unexpectedly disappears
for `mempool.(*TxPriorityQueue).Swap`.
(`runtime error: index out of range [0] with length 0`)

This can additionally lead to a `CONSENSUS FAILURE!!!` if the race
condition occurs for `internal/consensus.(*State).finalizeCommit`
when it tries to do `mempool.(*TxPriorityQueue).RemoveTx`, but
the ReapMax has already removed all elements from the underlying
heap. (`runtime error: index out of range [-1]`)

This commit switches the lock type to a write-lock (Lock) to ensure
no parallel modifications take place. This commit additionally updates
the tests to allow parallel execution of the func calls in testing,
as to prevent regressions (in case someone wants to downgrade the locks
without considering the implications from the underlying heap usage).

* Fix root dir for tendermint reindex command (#210)

* Replay events during restart to avoid tx missing (#211)

---------

Co-authored-by: Denys S <150304777+dssei@users.noreply.github.com>
Co-authored-by: Valters Jansons <sigv@users.noreply.github.com>
Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants