Use blocks table, not header_cids, for getBlockNumber() #232

telackey · 2023-03-09T22:11:50Z

On our machines, the single most expensive query is:

SELECT block_number FROM eth.header_cids ORDER BY block_number DESC LIMIT 1

This query touches gazillions of rows, almost certainly as a consequence of only having a BRIN index on header_cids.block_number.

Even using gibbon (which has a comparatively small DB vs rhino or vulture) the comparison is significant:

OLD:

"Limit  (cost=28471.11..28471.24 rows=1 width=8) (actual time=182.739..249.251 rows=1 loops=1)"
"  ->  Gather Merge  (cost=28471.11..102315.06 rows=610660 width=8) (actual time=182.738..249.249 rows=1 loops=1)"
"        Workers Planned: 5"
"        Workers Launched: 5"
"        ->  Sort  (cost=27471.04..27776.37 rows=122132 width=8) (actual time=164.437..164.437 rows=1 loops=6)"
"              Sort Key: block_number DESC"
"              Sort Method: top-N heapsort  Memory: 25kB"
"              Worker 0:  Sort Method: top-N heapsort  Memory: 25kB"
"              Worker 1:  Sort Method: top-N heapsort  Memory: 25kB"
"              Worker 2:  Sort Method: top-N heapsort  Memory: 25kB"
"              Worker 3:  Sort Method: top-N heapsort  Memory: 25kB"
"              Worker 4:  Sort Method: top-N heapsort  Memory: 25kB"
"              ->  Parallel Index Only Scan using header_cid_index on header_cids  (cost=0.55..26860.38 rows=122132 width=8) (actual time=0.069..155.067 rows=101663 loops=6)"
"                    Heap Fetches: 87419"
"Planning Time: 4.875 ms"
"Execution Time: 249.310 ms"

NEW:

"Limit  (cost=0.58..0.60 rows=1 width=8) (actual time=0.028..0.029 rows=1 loops=1)"
"  ->  Index Only Scan using blocks_block_number_idx on blocks  (cost=0.58..67751347.29 rows=2837035909 width=8) (actual time=0.028..0.028 rows=1 loops=1)"
"        Heap Fetches: 1"
"Planning Time: 0.091 ms"
"Execution Time: 0.047 ms"

i-norden

LGTM! Is the poor performance due to a combination of timescaleDB partitioning by block_number + using BRIN to index block_number, or is this a more general issue when doing ORDER BY BRIN indexed columns?

Note: we're also changing the BRIN => BTREE in v5 (due to this finding, previously mentioned by David). Once that is done, should we revert to querying eth.header_cids since it is a much smaller table?

dboreham · 2023-03-10T01:53:05Z

LGTM! Is the poor performance due to a combination of timescaleDB partitioning by block_number + using BRIN to index block_number, or is this a more general issue when doing ORDER BY BRIN indexed columns?

Note: we're also changing the brin => btree in v5 (due to this finding, previously mentioned by David). Once that is done, should we revert to querying eth.header_cids since it is a much smaller table?

It's both. The lack of a b-tree index seems to lobotomize Timescale's query planner such that it looks in all partitions when only the highest block number range one would do. Then it also takes longer to scan each partition.

I'm not sure the size of the table matters. When executed efficiently this query will be very fast regardless of the size because it will only do one seek on the index.

Use blocks table, not header_cids, for getBlockNumber()

69dd816

telackey requested a review from i-norden March 9, 2023 22:11

telackey self-assigned this Mar 9, 2023

telackey requested a review from dboreham March 9, 2023 22:14

i-norden approved these changes Mar 10, 2023

View reviewed changes

telackey merged commit b42f96e into v4 Mar 10, 2023

telackey deleted the telackey/blocks branch March 10, 2023 02:03

i-norden pushed a commit that referenced this pull request Mar 17, 2023

Use blocks table, not header_cids, for getBlockNumber() (#232)

ea19cc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use blocks table, not header_cids, for getBlockNumber() #232

Use blocks table, not header_cids, for getBlockNumber() #232

telackey commented Mar 9, 2023

i-norden left a comment •

edited

Loading

dboreham commented Mar 10, 2023

Use blocks table, not header_cids, for getBlockNumber() #232

Use blocks table, not header_cids, for getBlockNumber() #232

Conversation

telackey commented Mar 9, 2023

i-norden left a comment • edited Loading

Choose a reason for hiding this comment

dboreham commented Mar 10, 2023

i-norden left a comment •

edited

Loading