Don't duplicate queued storage diffs #103

rmulhol · 2019-06-14T16:28:48Z

currently, if we don't recognize the same diff several times (e.g.
if you restart the storage diff watcher pointed at the same file),
we'll add the same row to the queue on each run.
these changes assure we only queue an unrecognized diff once.

- currently, if we don't recognize the same diff several times (e.g. if you restart the storage diff watcher pointed at the same file), we'll add the same row to the queue on each run. - these changes assure we only queue an unrecognized diff once.

i-norden

LGTM, just two quick questions for my benefit.

i-norden · 2019-06-17T13:05:28Z

libraries/shared/storage/storage_queue.go

@@ -38,7 +38,7 @@ func NewStorageQueue(db *postgres.DB) StorageQueue {
 func (queue StorageQueue) Add(row utils.StorageDiffRow) error {
 	_, err := queue.db.Exec(`INSERT INTO public.queued_storage (contract,
 		block_hash, block_height, storage_key, storage_value) VALUES
-		($1, $2, $3, $4, $5)`, row.Contract.Bytes(), row.BlockHash.Bytes(),
+		($1, $2, $3, $4, $5) ON CONFLICT DO NOTHING`, row.Contract.Bytes(), row.BlockHash.Bytes(),


Been putting similar unique constraints on the various CID tables on the syncAndPublish branch and decided to overwrite previous entries on a conflict. My reasoning being that, for example, if finality had not been reached during the first entry we would want to be able to overwrite it with a later entry after finality has maybe been reached. This might only make sense in the context of transactions and headers -or maybe not at all 🙃- which is why I'm curious what your reasoning is!

Thanks!

Yeah this definitely seems like a place where we could benefit from some group conversation around conventions and approaches. In this case, the uniqueness constraint is on all five columns - so if we see a conflict, it means the entire row is exactly the same. I figured doing any sort of update in that situation would be superfluous, but I'm definitely down for specifying exactly where/how we want to do updates on conflict.

One place that's had me 🤔 recently is how we handle conflicting event logs in plugins. Right now, I believe we check for conflicts on header_id + tx_idx + log_idx and update if they match an existing row. I would think that we should never see a delta in data coming off of the same header (i.e. same hash), which would mean we could also do nothing on that conflict. But I may be missing something...

Thanks for bringing this discussion up guys.

I think that ON CONFLICT DO NOTHING makes sense for public.queued_storage records, since the unique constraint is on all 5 columns.

With the event logs, I agree that it seems like we shouldn't see any different data if the header hash is the same. The thing that is making me hesitate to say let's switch this over to ON CONFLICT DO NOTHING as well is that the event logs we currently have been creating reference the header_id as opposed to the header's hash. I think this should have essentially the same effect since the header repo deletes the old header record, and creates a new header record if the new header hash != the old header hash for a given block number. But for some reason I feel like there's some edge case that I'm not considering.

i-norden · 2019-06-17T13:05:47Z

db/schema.sql

@@ -12,6 +12,7 @@ SET client_encoding = 'UTF8';
 SET standard_conforming_strings = on;
 SELECT pg_catalog.set_config('search_path', '', false);
 SET check_function_bodies = false;
+SET xmloption = content;


What does this do exactly?

I think this is just confirming a default for how strings are cast to/from xml: https://www.postgresql.org/docs/current/datatype-xml.html

I don't think this is relevant to our schema, since I can't think of a place where we're casting xml. But it's auto-generated in the schema I get running Postgres 11

elizabethengelman

elizabethengelman · 2019-06-17T20:53:59Z

libraries/shared/storage/storage_queue.go

@@ -38,7 +38,7 @@ func NewStorageQueue(db *postgres.DB) StorageQueue {
 func (queue StorageQueue) Add(row utils.StorageDiffRow) error {
 	_, err := queue.db.Exec(`INSERT INTO public.queued_storage (contract,
 		block_hash, block_height, storage_key, storage_value) VALUES
-		($1, $2, $3, $4, $5)`, row.Contract.Bytes(), row.BlockHash.Bytes(),
+		($1, $2, $3, $4, $5) ON CONFLICT DO NOTHING`, row.Contract.Bytes(), row.BlockHash.Bytes(),


Thanks for bringing this discussion up guys.

I think that ON CONFLICT DO NOTHING makes sense for public.queued_storage records, since the unique constraint is on all 5 columns.

With the event logs, I agree that it seems like we shouldn't see any different data if the header hash is the same. The thing that is making me hesitate to say let's switch this over to ON CONFLICT DO NOTHING as well is that the event logs we currently have been creating reference the header_id as opposed to the header's hash. I think this should have essentially the same effect since the header repo deletes the old header record, and creates a new header record if the new header hash != the old header hash for a given block number. But for some reason I feel like there's some edge case that I'm not considering.

Don't duplicate queued storage diffs

e11f2c8

- currently, if we don't recognize the same diff several times (e.g. if you restart the storage diff watcher pointed at the same file), we'll add the same row to the queue on each run. - these changes assure we only queue an unrecognized diff once.

rmulhol requested review from m0ar, yaoandrew, elizabethengelman, Gslaughl and i-norden June 14, 2019 16:28

i-norden approved these changes Jun 17, 2019

View reviewed changes

elizabethengelman approved these changes Jun 17, 2019

View reviewed changes

rmulhol merged commit f0a7a7d into staging Jun 17, 2019

rmulhol deleted the remove-queued-storage-duplicates branch June 17, 2019 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't duplicate queued storage diffs #103

Don't duplicate queued storage diffs #103

rmulhol commented Jun 14, 2019

i-norden left a comment

i-norden Jun 17, 2019

rmulhol Jun 17, 2019 •

edited

Loading

elizabethengelman Jun 17, 2019

i-norden Jun 17, 2019

rmulhol Jun 17, 2019

elizabethengelman left a comment

elizabethengelman Jun 17, 2019

Don't duplicate queued storage diffs #103

Don't duplicate queued storage diffs #103

Conversation

rmulhol commented Jun 14, 2019

i-norden left a comment

Choose a reason for hiding this comment

i-norden Jun 17, 2019

Choose a reason for hiding this comment

rmulhol Jun 17, 2019 • edited Loading

Choose a reason for hiding this comment

elizabethengelman Jun 17, 2019

Choose a reason for hiding this comment

i-norden Jun 17, 2019

Choose a reason for hiding this comment

rmulhol Jun 17, 2019

Choose a reason for hiding this comment

elizabethengelman left a comment

Choose a reason for hiding this comment

elizabethengelman Jun 17, 2019

Choose a reason for hiding this comment

rmulhol Jun 17, 2019 •

edited

Loading