Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor tweaks to the database schema #202

Merged
merged 2 commits into from
Aug 19, 2022
Merged

Minor tweaks to the database schema #202

merged 2 commits into from
Aug 19, 2022

Conversation

Maelkum
Copy link
Contributor

@Maelkum Maelkum commented Aug 17, 2022

This PR makes slight tweaks to the database schema..

  1. Change chain_id column to an INTEGER instead of a NUMERIC.

Numeric is arbitrary precision which we don't need in this case, as chain IDs are small integers. Would decrease table size and improve performance, see here - https://dba.stackexchange.com/a/110882. I would argue that we can also switch to BIGINT for the block_number column, which is also a numeric, though we use that column in queries a lot less.

  1. Change the sales_collection_address_idx index to also include the token_id besides the lowercased collection address.

We never do queries based on token_id alone. We typically do

SELECT *
FROM sales
WHERE LOWER(collection_address) = LOWER('0x123456789') AND token_id = 'X'

With the current setup we have two indexes - one on lowercased collection_address and one on token_id. On query execution, database engine will do one index scan on collection_address, one on token_id, and do a BitmapAnd to see which rows match on both conditions.

If we instead have an index on LOWER(collection_address), token_id, then a single index will be used in a query.

So the difference is doing two index scans + additional operations vs a single index scan. On a really trivial test I did locally on a table with 7.5M rows, the SELECT in the first case takes 0.771 ms of planning and 101.433 ms of execution; the second one takes 0.391 ms for planning and 0.138 ms for execution.

On the other hand, all queries doing

SELECT *
FROM sales
WHERE LOWER(collection_address) = LOWER(`0x123456789')

will still benefit just as much from the index (since collection address is the leftmost column), so we don't lose anything in performance there.

@Maelkum Maelkum requested review from awfm9 and Ullaakut August 17, 2022 10:59
@Maelkum Maelkum self-assigned this Aug 17, 2022
@awfm9
Copy link
Contributor

awfm9 commented Aug 19, 2022

The maximum value for the chain ID is 9,223,372,036,854,775,771, see:

ethereum/EIPs#2294

That's much bigger than the 2,147,483,647 provided for by the Postgres INTEGER type. The maximum value for BIGINT is 9,223,372,036,854,775,807 and thus barely accommodates it. So we could make that change.

The block number is a different story. It is a 256-bit integer for Ethereum, which has a maximum of 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,935.

However, we can somewhat estimate a reasonable upper bound, because it increases monotonically. If we have one block per second for a hundred years, that would be 3,153,600,000 blocks. We should never get close to this threshold, so again, the INTEGER type is insufficient. However, the BIGINT type in that case should also provide sufficient buffer.

So if you make these changes, that's a great optimization.

@awfm9 awfm9 merged commit d83f27d into master Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants