Performance of JSON RPC is degraded with the heavy loading #3896

vbaranov · 2023-07-24T20:45:42Z

Describe the bug

After several hours of running ETH Mainnet instance of Blockscout on Reth node, we started to observe performance issues with node's JSON RPC.

Requests like eth_getTransactionReceipt, eth_getTransactionByHash, eth_getBalance, eth_getBlockByNumber, trace_replayBlockTransactions totally start to hang (there is no response from the node when requested).

Those and new similar requests were hanging even after disabling traffic from the application. Only restart of the node returns it to the operational state.

The Reth node is running via docker-compose with this yaml:

version: '3.9'

services:
  reth:
    container_name: reth
    image: ghcr.io/paradigmxyz/reth:v0.1.0-alpha.4
    restart: always
    pull_policy: always
    privileged: true
    volumes:
      - /data/reth/rethlogs:$HOME/rethlogs
      - /data/reth:$HOME/.local/share/reth
    environment:
      RUST_BACKTRACE: 'full'
    command: >
      node
      --authrpc.jwtsecret=$HOME/.local/share/reth/mainnet/jwt.hex
      --authrpc.addr="0.0.0.0"
      --http
      --http.addr="0.0.0.0"
      --http.port="8545"
      --http.api="eth,debug,net,trace,web3,txpool"
      --ws
      --ws.addr="0.0.0.0"
      --ws.port="8546"
      --ws.api="eth,debug,net,trace,web3,txpool"
      --metrics reth:9001
      --log.directory $HOME
      --rpc-max-connections 500
    ports:
      - '9001:9001'
      - '8545:8545'
      - '8546:8546'
      - '8551:8551'

  consensus:
...

Prysm is used for consensus layer.

Blockscout loading to the node can be enabled using this repo for perf issue reproduction.

Steps to reproduce

Steps to reproduce:

Run Reth node v0.1.0-alpha.4 for ETH Mainnet using docker-compose
Run Blockscout application:

git clone https://github.com/vbaranov/reth-perf-issue-reproduction
ETHEREUM_JSONRPC_HTTP_URL=... ETHEREUM_JSONRPC_WS_URL=... docker-compose up

where ETHEREUM_JSONRPC_HTTP_URL - node's JSON RPC endpoint, ETHEREUM_JSONRPC_WS_URL - node's WS endpoint

For instance, with the node running on the same host:

ETHEREUM_JSONRPC_HTTP_URL=http://localhost:8545 ETHEREUM_JSONRPC_WS_URL=ws://localhost:8545 docker-compose up

Keep it running for 1 - 2 hours. Blockscout will be working at http://localhost:4000 and load the node with various requests.

As a results the node will stop respond to any requests beside simple like eth_blockNUmber, web3_clientVersion, net_version.

At least, JSON RPC requests to these methods will hang:

eth_getBlockByNumber
eth_getTransactionReceipt
eth_getTransactionByHash
eth_getBalance
trace_replayBlockTransactions

Node logs

reth       | 2023-07-21T14:47:00.045030Z  WARN connection{remote_addr=65.108.226.150:21279 conn_id=433}: jsonrpsee_server::server: HTTP serve connection failed hyper::Error(IncompleteMessage)
reth       | 2023-07-21T14:47:00.332972Z  WARN connection{remote_addr=65.108.226.150:45069 conn_id=434}: jsonrpsee_server::server: HTTP serve connection failed hyper::Error(IncompleteMessage)
reth       | 2023-07-21T14:47:00.897910Z  WARN connection{remote_addr=65.108.226.150:10827 conn_id=435}: jsonrpsee_server::server: HTTP serve connection failed hyper::Error(IncompleteMessage)



### Platform(s)

_No response_

### What version/commit are you on?

Reth v0.1.0-alpha.4

### What database version are you on?

_No response_

### If you've built Reth from source, provide the full command you used

_No response_

### Code of Conduct

- [X] I agree to follow the Code of Conduct

The text was updated successfully, but these errors were encountered:

mattsse · 2023-07-24T22:24:34Z

thanks for this,

does blockscout run a lot of trace_* requests?

there#s another --rpc-max-tracing-requests cli argument for limiting concurrent tracing requests. the default is very low, maybe we should bump it
I suspect it has something to with this and because I didn't implement request timeouts yet, on it though

vbaranov · 2023-07-25T09:49:03Z

does blockscout run a lot of trace_* requests?

In the provided setup there were 2 types of trace_* requests:

trace_replayBlockTransactions - batched requests (15 blocks in the batch, 12 parallel requests at once)
trace_block - batched requests (10 blocks in the batch, 4 parallel requests at once)

mattsse · 2023-07-25T12:05:12Z

I was unable to run the docker-compose repro, could be an Apple silicon docker issue

blockscout | qemu: uncaught target signal 11 (Segmentation fault) - core dumped
blockscout | bash: line 1:
blockscout | 8
blockscout | Segmentation fault
blockscout |
blockscout | bin/blockscout eval "Elixir.Explorer.ReleaseTasks.create_and_migrate()"
blockscout exited with code 139

could you try again after bumping --rpc-max-tracing-requests to something like 1000

one issue here could be that all the rpc handlers that require blocking IO are executed on the same thread pool, so tracing could interfere with regular rpc tasks, which could explain the hanging eth_ requests.
I'll investigate using a dedicated threadpool just for tracing, which should resolve this

github-actions · 2023-08-10T01:55:11Z

This issue is stale because it has been open for 14 days with no activity.

mattsse · 2024-04-07T11:18:55Z

this should be fixed now that we improved trace request rate limiting derived from available cores

vbaranov added C-bug An unexpected or incorrect behavior S-needs-triage This issue needs to be labelled labels Jul 24, 2023

github-project-automation bot added this to Reth Tracker Jul 24, 2023

github-project-automation bot moved this to Todo in Reth Tracker Jul 24, 2023

mattsse mentioned this issue Jul 25, 2023

Use a dedicated threadpool for tracing rpc requests #3904

Closed

onbjerg added C-perf A change motivated by improving speed, memory usage or disk footprint A-rpc Related to the RPC implementation and removed S-needs-triage This issue needs to be labelled labels Jul 26, 2023

github-actions bot added the S-stale This issue/PR is stale and will close with no further activity label Aug 10, 2023

mattsse added M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity and removed S-stale This issue/PR is stale and will close with no further activity labels Aug 10, 2023

This was referenced Aug 31, 2023

high reqs/sec puts rpc into a broken state only fixed by restart #3774

Closed

Fix: high load RPC putting node in a broken state: avoid running blocking tasks within blocking tasks #4461

Merged

mattsse closed this as completed Apr 7, 2024

github-project-automation bot moved this from Todo to Done in Reth Tracker Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of JSON RPC is degraded with the heavy loading #3896

Performance of JSON RPC is degraded with the heavy loading #3896

vbaranov commented Jul 24, 2023 •

edited

Loading

mattsse commented Jul 24, 2023 •

edited

Loading

vbaranov commented Jul 25, 2023

mattsse commented Jul 25, 2023

github-actions bot commented Aug 10, 2023

mattsse commented Apr 7, 2024

Performance of JSON RPC is degraded with the heavy loading #3896

Performance of JSON RPC is degraded with the heavy loading #3896

Comments

vbaranov commented Jul 24, 2023 • edited Loading

Describe the bug

Steps to reproduce

Node logs

mattsse commented Jul 24, 2023 • edited Loading

vbaranov commented Jul 25, 2023

mattsse commented Jul 25, 2023

github-actions bot commented Aug 10, 2023

mattsse commented Apr 7, 2024

vbaranov commented Jul 24, 2023 •

edited

Loading

mattsse commented Jul 24, 2023 •

edited

Loading