(VDB-371) recheck queued storage #86

rmulhol · 2019-04-25T18:32:32Z

Currently setup to run at 5 minute intervals, but that's easy to change if we gather some insight about how often we actually want it to run

i-norden

This is all looks great!! I couldn't find much to fix but I've left some questions.

i-norden · 2019-04-25T23:26:45Z

libraries/shared/watcher/storage_watcher_test.go

-			Expect(string(logContent)).To(ContainSubstring(utils.ErrContractNotFound{Contract: address.Hex()}.Error()))
-		}, w, mockTailer, []*tail.Line{line})
-	})
+			Eventually(func() (string, error) {


🙌 These asynchronous tests are really cool!! I'm going to want to reuse this pattern. To make sure I understand this correctly, the eventually keeps checking for those values til they assert positively or negatively or until a timeout, and the done Done allows the tests themselves to be ran asynchronously by waiting for done to be closed until a certain timeout?

Good question!

As I understand it, Eventually means the assertion only fails if it remains false through the timeout (but not just because it's false when it's first evaluated), and Done makes the tests run asynchronously so that channel reads aren't blocking other tests.

I think it may be a little redundant to be using both Eventually and Done, since both enforce timeouts. We could probably get away with removing Done, but it seems like maybe there's added value insofar as the tests don't block while awaiting something like a (nanosecond 😂) ticker.

i-norden · 2019-04-25T23:38:38Z

libraries/shared/fetcher/storage_fetcher.go

+		if parseErr != nil {
+			errs <- parseErr
+		}
+		out <- row


This still sends after a parseErr, is it okay that empty StorageDiffRow{}s enter the out channel in that case?

Nice catch! I had fixed that on a spike I did beforehand but then forgot it after test-driving this work. Probably don't want to send empty rows 👍

i-norden · 2019-04-26T00:33:17Z

libraries/shared/watcher/storage_watcher.go

-			return parseErr
+	executeErr := storageTransformer.Execute(row)
+	if executeErr != nil {
+		if isKeyNotFound(executeErr) {


I'm not very experienced with the storage transformers, what is it that can change to break this error from continuing in processQueue?

Not sure I understand the question - could you say more? processQueue differs from processRow in that it only takes further action if exectureErr is nil - otherwise it just leaves the row in the queue to be retried later.

Sorry that was worded poorly, I guess I am wondering what it is that changes in the storage transformer's execution cycle that causes the executeErr it returns to be nil in the processQueue when it had been isKeyNotFound here.

Oh yeah great question! That's basically the crux of what makes this feature worthwhile :)

Storage keys for values in more complex data structures like mappings are derived by hashing the index of that variable on the contract concatenated with the key to the mapping. E.g. if the contract has a variable mapping (address => uint256) balances, you need to know the address keys in that mapping to be able to recognize a given diff as representing a changed balance.

We're using the event transformers to catch all of the possible keys being used in the mappings we care about. But the event transformers could be lagging behind the storage transformers, in which case we wouldn't yet be able to recognize a given diff's storage key. That's where we queue the diff, and then this process will keep retrying - on the assumption that the event transformers will eventually give us the value we need to recognize the key.

Thank you, that clears everything up!

i-norden · 2019-04-26T00:35:39Z

libraries/shared/fetcher/storage_fetcher.go

@@ -0,0 +1,33 @@
+package fetcher


Could add license header

m0ar

Looks good! Some curious questions 🕵️‍♀️

m0ar · 2019-04-29T12:07:14Z

cmd/root.go

-	pollingInterval  = 7 * time.Second
-	validationWindow = 15
+	pollingInterval      = 7 * time.Second
+	queueRecheckInterval = 5 * time.Minute


Wouldn't it be nice to make this a CLI argument with a default value? Backwards compatible, but still configurable.

(kinda applies to all our hard coded constants like batch size, validation window, etc, but that's not in scope here)

Yeah I dig that idea 👍

m0ar · 2019-04-29T12:19:10Z

libraries/shared/storage/storage_queue_test.go

+	)
+
+	BeforeEach(func() {
+		row = utils.StorageDiffRow{
 			Contract:     common.HexToAddress("0x123456"),
 			BlockHash:    common.HexToHash("0x678901"),
 			BlockHeight:  987,
 			StorageKey:   common.HexToHash("0x654321"),
 			StorageValue: common.HexToHash("0x198765"),


For future-proofing, does it make sense to see if it works without the leading 0x?

Or is it formatted that way for both geth and parity for example?

@m0ar I think the 0x is inconsequential when calling common.HexToHash since its implementation converts the string with a function that chops off the 0x.

Yeah but I mean the storage diff code, does it always assume there is no leading 0x?

Nah the code that converts the raw diffs to their internal representation uses the same functions, so it'll handle hex values with and without the prefix.

m0ar · 2019-04-29T12:20:41Z

libraries/shared/storage/storage_queue_test.go

+		Expect(len(rows)).To(Equal(2))
+		Expect(rows[0]).NotTo(Equal(rows[1]))
+		Expect(rows[0].Id).NotTo(BeZero())
+		Expect(rows[0].Contract).To(Or(Equal(row.Contract), Equal(rowTwo.Contract)))


Does the ordering not matter?

I don't think the ordering should matter here, since we iterate through all rows. Let me know if it seems like I'm missing something, though.

m0ar · 2019-04-29T12:23:27Z

libraries/shared/watcher/storage_watcher.go

+			storageWatcher.processRow(row)
+		case <-ticker.C:
+			storageWatcher.processQueue()
+		}


Would it make sense to add a default sleep case here if all falls through?

🤔 good question! Curious to hear more from you and others.

On the one hand, I'm inclined to keep spinning so that we stay up to date with new rows as quickly as possible. But, on the other hand, I could see the benefit of throttling that if we're checking many times more often than necessary.

I think I'd be tempted to stick with the current implementation and then do some investigation around optimal sleep times if we decide to go that route, but am very open to reconsidering.

I'm thinking we run this on the same machine as the other commands, won't this hog a core basically permanently?

Like when the interval is 5 minutes, this will still loop at blazing speed until some channel is populated no?

that sounds right but it's unclear to me if we need to make that optimization right now. Our light sync code, for example, has a similar pattern, and seems to run fine alongside other processes right now (granted, that loop does have a sleep in one of the case statements - but I believe it would loop by default while neither the ticker or nor the missing blocks are receiving messages).

I'm definitely not opposed to exploring what kind of hit we're taking here and putting a story on the board to add defaults that sleep. But i think it may be worth a separate story because it's a bit tricky to test.

That linked code is a bit different, since it'll sleep after every run of backfilling if it yielded no missing blocks. Only in the case of actual meat to work with it'll re-run directly afterwards. Anyway, putting it in a later story is fine :)

m0ar · 2019-04-29T12:25:20Z

libraries/shared/watcher/storage_watcher.go

+				logrus.Warn(fmt.Sprintf("error queueing storage diff with unrecognized key: %s", queueErr))
+			}
+		} else {
+			logrus.Warn(fmt.Sprintf("error executing storage transformer: %s", executeErr))


Are these rows re-run some time, or how do we follow up failing transformer execution?

No, I think these rows would be lost. Thinking I may change things to queue diffs on any error in execution...

m0ar · 2019-04-29T12:27:03Z

libraries/shared/watcher/storage_watcher.go

+}
+
+func (storageWatcher StorageWatcher) processQueue() {
+	rows, fetchErr := storageWatcher.Queue.GetAll()


Would it be safer from a disposability POW to peek+pop one element at a time instead? I'm thinking about the case where we empty and cache up the whole queue, but something happens and we crash/are killed/etc, do we not lose the queue?

m0ar · 2019-04-29T12:28:53Z

libraries/shared/watcher/storage_watcher.go

 		if !ok {
-			logrus.Warn(utils.ErrContractNotFound{Contract: row.Contract.Hex()}.Error())
+			// delete row from queue if address no longer watched
+			storageWatcher.deleteRow(row.Id)


...or does GetAll not actually remove from the queue?

Yep right now the GetAll function just fetches without deleting, pretty much exactly for the consideration you described above. Thinking that for now it makes sense to risk duplicate rows (if a queued diff that's processed isn't deleted) rather than risk deleting rows that haven't been processed, but also interested in exploring better ways of avoiding both risks.

I did consider doing everything inside of a transaction, but that felt a little heavy handed given that it would require (I think) another transformer implementation that passes around a transaction injected by the watcher. I think the better solution might lie in eventually using a different tool that's better suited to managing a queue, but would welcome thoughts.

Maybe some kind of unique constraint and inspecting the returned error? Since this runs quite seldom anyway, can't the one-at-a-time approach work?

Sure, that makes sense. Another case where I'd propose adding a separate story since it'd be a cross repo change set (adding constraints + a custom error and handling it specifically)

m0ar · 2019-04-29T12:29:51Z

libraries/shared/watcher/storage_watcher.go

+func (storageWatcher StorageWatcher) deleteRow(id int) {
+	deleteErr := storageWatcher.Queue.Delete(id)
+	if deleteErr != nil {
+		logrus.Warn(fmt.Sprintf("error deleting persisted row from queue: %s", deleteErr))


On a subsequent run, what would happen if we encounter an already persisted row?

right now the row's data would be duplicated by default, though a given storage transformer could prevent that by adding a uniqueness constraint on parsed storage rows' block number + value - though in that case you'd probably also want to return a nil error on execute so that the delete step could happen for the queued row.

m0ar · 2019-04-29T12:35:43Z

libraries/shared/watcher/storage_watcher_test.go

+				defer os.Remove(tempFile.Name())
+				logrus.SetOutput(tempFile)
+
+				go storageWatcher.Execute(rows, errs, time.Nanosecond)


This is a fast recheck interval... ould it mayhaps slow down the test since we'll go into that case very often? Maybe it isn't an issue, idk 🤷‍♀️

🤔 I may try tuning this parameter, but it's a little bit tricky to know if it's playing a role due to the fact that these tests run pretty fast and there's a decent bit of random variance in how long they take to execute across runs without changing it.

- Replaces directly reading from a CSV - Simplifies testing - Should hopefully make it easier to plug in other sources for storage diffs (e.g. differently formatted CSVs, JSON RPC, etc)

- Iterate through queued storage at defined interval, popping rows from the queue if successfully persisted

- For any error, not just if key isn't recognized - Means we don't lose track of diffs on random ephemeral errors

rmulhol requested review from m0ar, yaoandrew, elizabethengelman, Gslaughl and i-norden April 25, 2019 18:32

i-norden approved these changes Apr 26, 2019

View reviewed changes

m0ar approved these changes Apr 29, 2019

View reviewed changes

rmulhol added 7 commits May 1, 2019 12:30

Extract storage diff fetching behind an interface

2d684c5

- Replaces directly reading from a CSV - Simplifies testing - Should hopefully make it easier to plug in other sources for storage diffs (e.g. differently formatted CSVs, JSON RPC, etc)

Add get/delete functions to storage queue

bf4b168

(VDB-371) Recheck queued storage

6a86de8

- Iterate through queued storage at defined interval, popping rows from the queue if successfully persisted

Don't pass empty row to channel on error

d77f3fe

Add license

76ab914

Queue storage diffs if transformer execution fails

b036053

- For any error, not just if key isn't recognized - Means we don't lose track of diffs on random ephemeral errors

Make queue recheck interval configurable via CLI

6716c3b

rmulhol force-pushed the vdb-371-recheck-queued-storage branch from eec51b1 to 6716c3b Compare May 1, 2019 17:33

rmulhol merged commit 782e3fd into staging May 1, 2019

rmulhol deleted the vdb-371-recheck-queued-storage branch May 1, 2019 17:49

(VDB-371) recheck queued storage #86

(VDB-371) recheck queued storage #86

Conversation

rmulhol commented Apr 25, 2019

i-norden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m0ar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m0ar Apr 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmulhol Apr 29, 2019 • edited Loading

Choose a reason for hiding this comment

m0ar Apr 30, 2019 •

edited

Loading

rmulhol Apr 29, 2019 •

edited

Loading