-
Notifications
You must be signed in to change notification settings - Fork 34
Conversation
- currently, if we don't recognize the same diff several times (e.g. if you restart the storage diff watcher pointed at the same file), we'll add the same row to the queue on each run. - these changes assure we only queue an unrecognized diff once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just two quick questions for my benefit.
@@ -38,7 +38,7 @@ func NewStorageQueue(db *postgres.DB) StorageQueue { | |||
func (queue StorageQueue) Add(row utils.StorageDiffRow) error { | |||
_, err := queue.db.Exec(`INSERT INTO public.queued_storage (contract, | |||
block_hash, block_height, storage_key, storage_value) VALUES | |||
($1, $2, $3, $4, $5)`, row.Contract.Bytes(), row.BlockHash.Bytes(), | |||
($1, $2, $3, $4, $5) ON CONFLICT DO NOTHING`, row.Contract.Bytes(), row.BlockHash.Bytes(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been putting similar unique constraints on the various CID tables on the syncAndPublish branch and decided to overwrite previous entries on a conflict. My reasoning being that, for example, if finality had not been reached during the first entry we would want to be able to overwrite it with a later entry after finality has maybe been reached. This might only make sense in the context of transactions and headers -or maybe not at all 🙃- which is why I'm curious what your reasoning is!
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this definitely seems like a place where we could benefit from some group conversation around conventions and approaches. In this case, the uniqueness constraint is on all five columns - so if we see a conflict, it means the entire row is exactly the same. I figured doing any sort of update in that situation would be superfluous, but I'm definitely down for specifying exactly where/how we want to do updates on conflict.
One place that's had me 🤔 recently is how we handle conflicting event logs in plugins. Right now, I believe we check for conflicts on header_id
+ tx_idx
+ log_idx
and update if they match an existing row. I would think that we should never see a delta in data coming off of the same header (i.e. same hash), which would mean we could also do nothing on that conflict. But I may be missing something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing this discussion up guys.
I think that ON CONFLICT DO NOTHING
makes sense for public.queued_storage
records, since the unique constraint is on all 5 columns.
With the event logs, I agree that it seems like we shouldn't see any different data if the header hash is the same. The thing that is making me hesitate to say let's switch this over to ON CONFLICT DO NOTHING
as well is that the event logs we currently have been creating reference the header_id
as opposed to the header's hash
. I think this should have essentially the same effect since the header repo deletes the old header record, and creates a new header record if the new header hash != the old header hash for a given block number. But for some reason I feel like there's some edge case that I'm not considering.
@@ -12,6 +12,7 @@ SET client_encoding = 'UTF8'; | |||
SET standard_conforming_strings = on; | |||
SELECT pg_catalog.set_config('search_path', '', false); | |||
SET check_function_bodies = false; | |||
SET xmloption = content; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just confirming a default for how strings are cast to/from xml: https://www.postgresql.org/docs/current/datatype-xml.html
I don't think this is relevant to our schema, since I can't think of a place where we're casting xml. But it's auto-generated in the schema I get running Postgres 11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -38,7 +38,7 @@ func NewStorageQueue(db *postgres.DB) StorageQueue { | |||
func (queue StorageQueue) Add(row utils.StorageDiffRow) error { | |||
_, err := queue.db.Exec(`INSERT INTO public.queued_storage (contract, | |||
block_hash, block_height, storage_key, storage_value) VALUES | |||
($1, $2, $3, $4, $5)`, row.Contract.Bytes(), row.BlockHash.Bytes(), | |||
($1, $2, $3, $4, $5) ON CONFLICT DO NOTHING`, row.Contract.Bytes(), row.BlockHash.Bytes(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing this discussion up guys.
I think that ON CONFLICT DO NOTHING
makes sense for public.queued_storage
records, since the unique constraint is on all 5 columns.
With the event logs, I agree that it seems like we shouldn't see any different data if the header hash is the same. The thing that is making me hesitate to say let's switch this over to ON CONFLICT DO NOTHING
as well is that the event logs we currently have been creating reference the header_id
as opposed to the header's hash
. I think this should have essentially the same effect since the header repo deletes the old header record, and creates a new header record if the new header hash != the old header hash for a given block number. But for some reason I feel like there's some edge case that I'm not considering.
if you restart the storage diff watcher pointed at the same file),
we'll add the same row to the queue on each run.