Observe mainnet IO traffic during release window for nearcore version 1.29 #7635

jakmeier · 2022-09-19T12:15:57Z

During a release, all nodes are restarted to run on the new binary. This clears the shard cache for trie nodes.

Make sure block processing time stays reasonable during that time window. Most importantly, we expect the prefetcher hit-rate on shard 3 to be high for the first few hours and only go down once the shard hit rate approaches >95%.

jakmeier · 2022-09-19T13:49:46Z

Release is not happening yet but we already simulated the same behavior using canary nodes in mainnet with version 1.29.0-rc.4 (the version currently on testnet).

My takeaways so far:

Looking good as expected

High prefetcher hit rate in the first 24 hours after a restart. Around 12% of all chunk cache misses are prefetched, 86% hit in the shard cache.
Prefetcher hit rate drops in the following 24 hours. About 91% hit in shard cache, 9% are prefetched.
Trend continues for the next 24h, here I see 93% in shard cache, 7% prefetched.
Running with a small shard cache, we only hit 62% in shard cache and prefetch 34%. These numbers are very stable right from the start.
No prefetch failures.
No prefetches requested on shards 0,1,2 if only sweatcoin prefetcher is enabled.
Memory requirements of prefetcher are very low. (Capped at 200MB but never exceeds ~40MB)

Not quite as expected:

We see main-thread retries counter going up on all shard, even when no prefetch requests are sent. This I can only explain with forks, that causes two chunks on the same shard being processed simultaneously. The second thread accessing the same trie node will then wait for the other to fetch the data first. The numbers in the "prefetch pending" counter confirm this theory. But it shows a bit of an inefficiency in this case, as we read from DB twice even though the second thread could read from the shard cache. (It's probably okay as we are still reading from RocksDB cache.)

jakmeier mentioned this issue Sep 19, 2022

Tracking issue: Leftover after Sweatcoin launch #7634

Open

7 tasks

jakmeier added A-storage Area: storage and databases T-storage labels Sep 19, 2022

exalate-issue-sync bot assigned jakmeier Sep 19, 2022

jakmeier closed this as completed Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observe mainnet IO traffic during release window for nearcore version 1.29 #7635

Observe mainnet IO traffic during release window for nearcore version 1.29 #7635

jakmeier commented Sep 19, 2022 •

edited by exalate-issue-sync bot

Loading

jakmeier commented Sep 19, 2022

Observe mainnet IO traffic during release window for nearcore version 1.29 #7635

Observe mainnet IO traffic during release window for nearcore version 1.29 #7635

Comments

jakmeier commented Sep 19, 2022 • edited by exalate-issue-sync bot Loading

jakmeier commented Sep 19, 2022

Looking good as expected

Not quite as expected:

jakmeier commented Sep 19, 2022 •

edited by exalate-issue-sync bot

Loading