Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of JSON RPC is degraded with the heavy loading #3896

Closed
vbaranov opened this issue Jul 24, 2023 · 5 comments
Closed

Performance of JSON RPC is degraded with the heavy loading #3896

vbaranov opened this issue Jul 24, 2023 · 5 comments
Labels
A-rpc Related to the RPC implementation C-bug An unexpected or incorrect behavior C-perf A change motivated by improving speed, memory usage or disk footprint M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity

Comments

@vbaranov
Copy link

vbaranov commented Jul 24, 2023

Describe the bug

After several hours of running ETH Mainnet instance of Blockscout on Reth node, we started to observe performance issues with node's JSON RPC.

Requests like eth_getTransactionReceipt, eth_getTransactionByHash, eth_getBalance, eth_getBlockByNumber, trace_replayBlockTransactions totally start to hang (there is no response from the node when requested).

Those and new similar requests were hanging even after disabling traffic from the application. Only restart of the node returns it to the operational state.

The Reth node is running via docker-compose with this yaml:

version: '3.9'

services:
  reth:
    container_name: reth
    image: ghcr.io/paradigmxyz/reth:v0.1.0-alpha.4
    restart: always
    pull_policy: always
    privileged: true
    volumes:
      - /data/reth/rethlogs:$HOME/rethlogs
      - /data/reth:$HOME/.local/share/reth
    environment:
      RUST_BACKTRACE: 'full'
    command: >
      node
      --authrpc.jwtsecret=$HOME/.local/share/reth/mainnet/jwt.hex
      --authrpc.addr="0.0.0.0"
      --http
      --http.addr="0.0.0.0"
      --http.port="8545"
      --http.api="eth,debug,net,trace,web3,txpool"
      --ws
      --ws.addr="0.0.0.0"
      --ws.port="8546"
      --ws.api="eth,debug,net,trace,web3,txpool"
      --metrics reth:9001
      --log.directory $HOME
      --rpc-max-connections 500
    ports:
      - '9001:9001'
      - '8545:8545'
      - '8546:8546'
      - '8551:8551'

  consensus:
...

Prysm is used for consensus layer.

Blockscout loading to the node can be enabled using this repo for perf issue reproduction.

Steps to reproduce

Steps to reproduce:

  1. Run Reth node v0.1.0-alpha.4 for ETH Mainnet using docker-compose
  2. Run Blockscout application:
git clone https://github.com/vbaranov/reth-perf-issue-reproduction
ETHEREUM_JSONRPC_HTTP_URL=... ETHEREUM_JSONRPC_WS_URL=... docker-compose up

where ETHEREUM_JSONRPC_HTTP_URL - node's JSON RPC endpoint, ETHEREUM_JSONRPC_WS_URL - node's WS endpoint

For instance, with the node running on the same host:

ETHEREUM_JSONRPC_HTTP_URL=http://localhost:8545 ETHEREUM_JSONRPC_WS_URL=ws://localhost:8545 docker-compose up
  1. Keep it running for 1 - 2 hours. Blockscout will be working at http://localhost:4000 and load the node with various requests.

As a results the node will stop respond to any requests beside simple like eth_blockNUmber, web3_clientVersion, net_version.

At least, JSON RPC requests to these methods will hang:

  • eth_getBlockByNumber
  • eth_getTransactionReceipt
  • eth_getTransactionByHash
  • eth_getBalance
  • trace_replayBlockTransactions

Node logs

reth       | 2023-07-21T14:47:00.045030Z  WARN connection{remote_addr=65.108.226.150:21279 conn_id=433}: jsonrpsee_server::server: HTTP serve connection failed hyper::Error(IncompleteMessage)
reth       | 2023-07-21T14:47:00.332972Z  WARN connection{remote_addr=65.108.226.150:45069 conn_id=434}: jsonrpsee_server::server: HTTP serve connection failed hyper::Error(IncompleteMessage)
reth       | 2023-07-21T14:47:00.897910Z  WARN connection{remote_addr=65.108.226.150:10827 conn_id=435}: jsonrpsee_server::server: HTTP serve connection failed hyper::Error(IncompleteMessage)



### Platform(s)

_No response_

### What version/commit are you on?

Reth v0.1.0-alpha.4

### What database version are you on?

_No response_

### If you've built Reth from source, provide the full command you used

_No response_

### Code of Conduct

- [X] I agree to follow the Code of Conduct
@vbaranov vbaranov added C-bug An unexpected or incorrect behavior S-needs-triage This issue needs to be labelled labels Jul 24, 2023
@mattsse
Copy link
Collaborator

mattsse commented Jul 24, 2023

thanks for this,

does blockscout run a lot of trace_* requests?

there#s another --rpc-max-tracing-requests cli argument for limiting concurrent tracing requests. the default is very low, maybe we should bump it
I suspect it has something to with this and because I didn't implement request timeouts yet, on it though

@vbaranov
Copy link
Author

does blockscout run a lot of trace_* requests?

In the provided setup there were 2 types of trace_* requests:

  • trace_replayBlockTransactions - batched requests (15 blocks in the batch, 12 parallel requests at once)
  • trace_block - batched requests (10 blocks in the batch, 4 parallel requests at once)

@mattsse
Copy link
Collaborator

mattsse commented Jul 25, 2023

I was unable to run the docker-compose repro, could be an Apple silicon docker issue

blockscout | qemu: uncaught target signal 11 (Segmentation fault) - core dumped
blockscout | bash: line 1:
blockscout | 8
blockscout | Segmentation fault
blockscout |
blockscout | bin/blockscout eval "Elixir.Explorer.ReleaseTasks.create_and_migrate()"
blockscout exited with code 139

could you try again after bumping --rpc-max-tracing-requests to something like 1000

one issue here could be that all the rpc handlers that require blocking IO are executed on the same thread pool, so tracing could interfere with regular rpc tasks, which could explain the hanging eth_ requests.
I'll investigate using a dedicated threadpool just for tracing, which should resolve this

@onbjerg onbjerg added C-perf A change motivated by improving speed, memory usage or disk footprint A-rpc Related to the RPC implementation and removed S-needs-triage This issue needs to be labelled labels Jul 26, 2023
@github-actions
Copy link
Contributor

This issue is stale because it has been open for 14 days with no activity.

@github-actions github-actions bot added the S-stale This issue/PR is stale and will close with no further activity label Aug 10, 2023
@mattsse mattsse added M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity and removed S-stale This issue/PR is stale and will close with no further activity labels Aug 10, 2023
@mattsse
Copy link
Collaborator

mattsse commented Apr 7, 2024

this should be fixed now that we improved trace request rate limiting derived from available cores

@mattsse mattsse closed this as completed Apr 7, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Reth Tracker Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rpc Related to the RPC implementation C-bug An unexpected or incorrect behavior C-perf A change motivated by improving speed, memory usage or disk footprint M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity
Projects
Archived in project
Development

No branches or pull requests

3 participants