Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

Don't duplicate queued storage diffs #103

Merged
merged 1 commit into from
Jun 17, 2019

Conversation

rmulhol
Copy link
Contributor

@rmulhol rmulhol commented Jun 14, 2019

  • currently, if we don't recognize the same diff several times (e.g.
    if you restart the storage diff watcher pointed at the same file),
    we'll add the same row to the queue on each run.
  • these changes assure we only queue an unrecognized diff once.

- currently, if we don't recognize the same diff several times (e.g.
  if you restart the storage diff watcher pointed at the same file),
  we'll add the same row to the queue on each run.
- these changes assure we only queue an unrecognized diff once.
Copy link
Collaborator

@i-norden i-norden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just two quick questions for my benefit.

@@ -38,7 +38,7 @@ func NewStorageQueue(db *postgres.DB) StorageQueue {
func (queue StorageQueue) Add(row utils.StorageDiffRow) error {
_, err := queue.db.Exec(`INSERT INTO public.queued_storage (contract,
block_hash, block_height, storage_key, storage_value) VALUES
($1, $2, $3, $4, $5)`, row.Contract.Bytes(), row.BlockHash.Bytes(),
($1, $2, $3, $4, $5) ON CONFLICT DO NOTHING`, row.Contract.Bytes(), row.BlockHash.Bytes(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been putting similar unique constraints on the various CID tables on the syncAndPublish branch and decided to overwrite previous entries on a conflict. My reasoning being that, for example, if finality had not been reached during the first entry we would want to be able to overwrite it with a later entry after finality has maybe been reached. This might only make sense in the context of transactions and headers -or maybe not at all 🙃- which is why I'm curious what your reasoning is!

Thanks!

Copy link
Contributor Author

@rmulhol rmulhol Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this definitely seems like a place where we could benefit from some group conversation around conventions and approaches. In this case, the uniqueness constraint is on all five columns - so if we see a conflict, it means the entire row is exactly the same. I figured doing any sort of update in that situation would be superfluous, but I'm definitely down for specifying exactly where/how we want to do updates on conflict.

One place that's had me 🤔 recently is how we handle conflicting event logs in plugins. Right now, I believe we check for conflicts on header_id + tx_idx + log_idx and update if they match an existing row. I would think that we should never see a delta in data coming off of the same header (i.e. same hash), which would mean we could also do nothing on that conflict. But I may be missing something...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this discussion up guys.

I think that ON CONFLICT DO NOTHING makes sense for public.queued_storage records, since the unique constraint is on all 5 columns.

With the event logs, I agree that it seems like we shouldn't see any different data if the header hash is the same. The thing that is making me hesitate to say let's switch this over to ON CONFLICT DO NOTHING as well is that the event logs we currently have been creating reference the header_id as opposed to the header's hash. I think this should have essentially the same effect since the header repo deletes the old header record, and creates a new header record if the new header hash != the old header hash for a given block number. But for some reason I feel like there's some edge case that I'm not considering.

@@ -12,6 +12,7 @@ SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do exactly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just confirming a default for how strings are cast to/from xml: https://www.postgresql.org/docs/current/datatype-xml.html

I don't think this is relevant to our schema, since I can't think of a place where we're casting xml. But it's auto-generated in the schema I get running Postgres 11

Copy link
Contributor

@elizabethengelman elizabethengelman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -38,7 +38,7 @@ func NewStorageQueue(db *postgres.DB) StorageQueue {
func (queue StorageQueue) Add(row utils.StorageDiffRow) error {
_, err := queue.db.Exec(`INSERT INTO public.queued_storage (contract,
block_hash, block_height, storage_key, storage_value) VALUES
($1, $2, $3, $4, $5)`, row.Contract.Bytes(), row.BlockHash.Bytes(),
($1, $2, $3, $4, $5) ON CONFLICT DO NOTHING`, row.Contract.Bytes(), row.BlockHash.Bytes(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this discussion up guys.

I think that ON CONFLICT DO NOTHING makes sense for public.queued_storage records, since the unique constraint is on all 5 columns.

With the event logs, I agree that it seems like we shouldn't see any different data if the header hash is the same. The thing that is making me hesitate to say let's switch this over to ON CONFLICT DO NOTHING as well is that the event logs we currently have been creating reference the header_id as opposed to the header's hash. I think this should have essentially the same effect since the header repo deletes the old header record, and creates a new header record if the new header hash != the old header hash for a given block number. But for some reason I feel like there's some edge case that I'm not considering.

@rmulhol rmulhol merged commit f0a7a7d into staging Jun 17, 2019
@rmulhol rmulhol deleted the remove-queued-storage-duplicates branch June 17, 2019 20:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants