Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparent Performance regression in eth_getLogs #25421

Closed
ryanschneider opened this issue Jul 27, 2022 · 1 comment · Fixed by #25459
Closed

Apparent Performance regression in eth_getLogs #25421

ryanschneider opened this issue Jul 27, 2022 · 1 comment · Fixed by #25459
Labels

Comments

@ryanschneider
Copy link
Contributor

System information

Geth version: 1.10.19 (but behavior appears unchanged in .21)
OS & Version: Linux

Expected behaviour

A node that is only asked about "recent" logs/receipts via eth_getLogs should be able to handle a high volume of that RPC without duplicate trips to the raw level DB.

Actual behaviour

In #17610 we added a receiptsCache LRU to BlockChain to prevent redundant db lookups and RLP decodes of receipts and logs from the DB. However, it appears that changes introduced in #23147 circumvent that LRU cache completely, leading to what appears to me to be a performance regression in eth_getLogs RPC performance.

With a recent version of geth, under only moderate eth_getLogs load that only accesses logs from the last 128 blocks, we quickly see that a significant portion of the nodes flamegraph is spent in the new ReadLogs method:

image

Steps to reproduce the behaviour

It's pretty easy to reproduce:

  • sync a node to head, preferably mainnet since it has the most logs in a given block.
  • generate some load on the geth node, for example using hey (https://github.com/rakyll/hey): hey -c 5 -z 30s -t 0 -m POST -T application/json -d '{"jsonrpc":"2.0", "id": 1123123, "method": "eth_getLogs", "params": [{}]}' http://0.0.0.0:8545/
  • even with only 5 concurrent eth_getLogs requests for the latest logs like the above, note that a non-trivial amount of time is spent in ReadLogs:

image

  • raise the number of concurrent requests to 50 (-c 50) and note that the amount of time spent in ReadLogs increases as well:

image

One would expect that the recent block's receipts/logs is only read from the raw DB once per block, and thus that the relative amount of time spent in ReadLogs would go down rather than up under increased load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
@ryanschneider and others