Minor tweaks to the database schema #202

Maelkum · 2022-08-17T10:59:18Z

This PR makes slight tweaks to the database schema..

Change chain_id column to an INTEGER instead of a NUMERIC.

Numeric is arbitrary precision which we don't need in this case, as chain IDs are small integers. Would decrease table size and improve performance, see here - https://dba.stackexchange.com/a/110882. I would argue that we can also switch to BIGINT for the block_number column, which is also a numeric, though we use that column in queries a lot less.

Change the sales_collection_address_idx index to also include the token_id besides the lowercased collection address.

We never do queries based on token_id alone. We typically do

SELECT *
FROM sales
WHERE LOWER(collection_address) = LOWER('0x123456789') AND token_id = 'X'

With the current setup we have two indexes - one on lowercased collection_address and one on token_id. On query execution, database engine will do one index scan on collection_address, one on token_id, and do a BitmapAnd to see which rows match on both conditions.

If we instead have an index on LOWER(collection_address), token_id, then a single index will be used in a query.

So the difference is doing two index scans + additional operations vs a single index scan. On a really trivial test I did locally on a table with 7.5M rows, the SELECT in the first case takes 0.771 ms of planning and 101.433 ms of execution; the second one takes 0.391 ms for planning and 0.138 ms for execution.

On the other hand, all queries doing

SELECT *
FROM sales
WHERE LOWER(collection_address) = LOWER(`0x123456789')

will still benefit just as much from the index (since collection address is the leftmost column), so we don't lose anything in performance there.

awfm9 · 2022-08-19T11:10:06Z

The maximum value for the chain ID is 9,223,372,036,854,775,771, see:

ethereum/EIPs#2294

That's much bigger than the 2,147,483,647 provided for by the Postgres INTEGER type. The maximum value for BIGINT is 9,223,372,036,854,775,807 and thus barely accommodates it. So we could make that change.

The block number is a different story. It is a 256-bit integer for Ethereum, which has a maximum of 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,935.

However, we can somewhat estimate a reasonable upper bound, because it increases monotonically. If we have one block per second for a hundred years, that would be 3,153,600,000 blocks. We should never get close to this threshold, so again, the INTEGER type is insufficient. However, the BIGINT type in that case should also provide sufficient buffer.

So if you make these changes, that's a great optimization.

Minor tweaks to the database schema

0721e21

Maelkum requested review from awfm9 and Ullaakut August 17, 2022 10:59

Maelkum self-assigned this Aug 17, 2022

Maelkum added the improvement label Aug 17, 2022

Ullaakut approved these changes Aug 17, 2022

View reviewed changes

Change chainID and block number types to BIGINT

374b18f

awfm9 approved these changes Aug 19, 2022

View reviewed changes

awfm9 merged commit d83f27d into master Aug 19, 2022

joeypunzel unassigned Maelkum Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor tweaks to the database schema #202

Minor tweaks to the database schema #202

Maelkum commented Aug 17, 2022

awfm9 commented Aug 19, 2022 •

edited

Loading

Minor tweaks to the database schema #202

Minor tweaks to the database schema #202

Conversation

Maelkum commented Aug 17, 2022

awfm9 commented Aug 19, 2022 • edited Loading

awfm9 commented Aug 19, 2022 •

edited

Loading