Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple IO cost estimation based on SSD specs #7060

Closed
Tracked by #5995
jakmeier opened this issue Jun 17, 2022 · 4 comments
Closed
Tracked by #5995

Simple IO cost estimation based on SSD specs #7060

jakmeier opened this issue Jun 17, 2022 · 4 comments
Assignees
Labels
A-params-estimator Area: runtime params estimator T-contract-runtime Team: issues relevant to the contract runtime team

Comments

@jakmeier
Copy link
Contributor

jakmeier commented Jun 17, 2022

Getting a correct and deterministic gas estimation of IO costs going will take some time. (See #5995 and #7053 + #7058 + #7059 for current efforts in that direction.)

In the meantime, we want an alternative a approach that serves as a sanity-checking tool. Results only have to be approximate. We would only use it to validate that a gas parameter is roughly in-line with back-of-the-envelop calculations.

One way I want to explore to do this is the following:

  • For a gas parameter, consider how many reads and writes to persistent data storage are required for the workload. This should independent of implementation details.
  • Then define assumptions on SSD latency for reads and writes.
  • Compute the time to serve all requests sequentially and convert to gas using 1Pgas = 1s.

Here I would assume the implementation is "perfect" in the sense that no complicated data structure need to be traversed and in-memory caching is large enough to fit all data for a few blocks. At the same time, also assume that no "pre-fetching" happens, neither speculatively nor otherwise.

In other words, reading N unqiue values from disk is assumed to take exactly N IOPs, regardless of data locality. (As long as they are smaller than a block size = 4kB. Otherwise, each block requires at least one IOP)

This should give us a rough estimate of what performance we can expect from a client implementation. Let's do that for all parameters directly related to storage and see if we get similar gas numbers to what we have today.

@jakmeier jakmeier changed the title Simple IO cost estimation Simple IO cost estimation based on SSD specs Jun 17, 2022
@jakmeier jakmeier self-assigned this Jun 17, 2022
@jakmeier jakmeier added T-contract-runtime Team: issues relevant to the contract runtime team A-params-estimator Area: runtime params estimator labels Jun 17, 2022
@jakmeier
Copy link
Contributor Author

My replayer tool around currently parses the output described in ##7053.
I aim at merging that change first and then later submit the replayer code.

But the idea is that with such a replayer, it is very easy to extract such statistics and more. It already allows me to analyse trie node (shard and chunk) caches. See this output that it generates in one of its analysis options. It shows two receipts executed in the same block.

GET BlockInfo "E2zwRdCMngXBmKa7PLVc2tsdRqnoz6vzAV3MR3bTnbBp" size=221
      process_receipt predecessor=v1.token.sweat.testnet receiver=v1.token.sweat.testnet id=fWEKRcn8NeXscFUgYYq1gYfFZHSqyH29k97UJsQBqaZ chunk-cache-hit=13 chunk-cache-too-large=1
 chunk-cache-miss=1
        DB GET          122 requests for a total of   214507 B
        DB SET            0 requests for a total of        0 B
        STORAGE READ    136 requests for a total of       57 B
        STORAGE WRITE   136 requests for a total of     2217 B
        TRIE NODES    7903 /  27 / 121  (chunk-cache/shard-cache/DB)
        SHARD CACHE       18.24% hit rate
        CHUNK CACHE         98.16% hit rate,  98.49% if removing 27 shard cache hits
      process_receipt predecessor=v1.token.sweat.testnet receiver=v1.token.sweat.testnet id=2eTPuWAJNm5KX89HQtG32cbDRdh7goBrHBGb6bJdKeEb chunk-cache-too-large=1 chunk-cache-miss=
1 chunk-cache-hit=11
        DB GET          121 requests for a total of   214381 B
        DB SET            0 requests for a total of        0 B
        STORAGE READ    136 requests for a total of       57 B
        STORAGE WRITE   136 requests for a total of     2217 B
        TRIE NODES    7916 /   0 / 120  (chunk-cache/shard-cache/DB)
        SHARD CACHE        0.00% hit rate
        CHUNK CACHE       98.51% hit rate

We can see a bunch of interesting things here.

  • Chunk cache is very effective
  • The shard cache is useful for storage operations outside of function calls (e.g. loading contract state)
  • Inside fn calls, the shard cache was only useful for the first receipt, the second receipt was fully served by chunk cache.

(Just as an example. Actual analysis of this data should be done in storage optimization context.)

@jakmeier
Copy link
Contributor Author

Here are some statistics for DB operations per storage operations. It is based on our current estimator runs and the new instrumentation I am working on.

First, the assumption is that 1DB operation = 1 disk access. Then I assume an IO operation latency of 143ns. This is derived from a measured 7000 IOPS on gcloud with a full NVME SSD and iodepth = 1. Then I inverted the 7000 IOPS, as iodepth=1 makes sequential accesses, this should be roughly what we can expect for sequential calls to the disk.

  Reads Writes TX / block Inner Repetitions DB op per host function call Time [us] with SSD latency = 143ns Time [us] with current gas price
StorageHasKeyBase 1713 52 2 1000 0.8825 126.2 54.0
StorageReadBase 2721 52 2 1000 1.3865 198.3 56.4
StorageWriteBase 188 2633 2 1000 1.4105 201.7 64.2

Conclusion

There are two ways of looking at this:

  1. We undercharge by around a factor of 4 for storage, compared to this base-line.
  2. In our estimations today, IO operations are around 4 times (12 if we factor in the safety multiplier of 3) more efficient than accessing the disk every time we have a DB operation. This is even with all caches turned off, as it is today's practice in the estimator. So the efficiency supposedly comes from RocksDB various features, such as fetching entire blocks at once and keeping them in memory.

Instrumentation

I collected these numbers running first

cargo run --release -p runtime-params-estimator --features=required,io_trace -- --metric time --iters 1 --warmup-iters 0 --record-io-trace=some_log.file

and then

cargo run -p runtime-params-estimator -- replay some_log.file --mode estimator-gas

It shows the total DB operations grouped by block measured in the estimator. This is not perfect, yet, as it require manual effort to find the important measurements and divide by the appropriate number as it is also done in the estimation. But for now, I just wanted to get the numbers and see if they are useful, before putting in too much effort in automation.

@jakmeier
Copy link
Contributor Author

For other storage parameters, looking at actual DB calls in traces is not useful.

  • Per-byte costs are to cover larger sizes of requests, so the number of requests is not interesting. But we can use the rule mentioned in the initial issue description: Every 4kB requires at least one IOP.
  • Touching trie node cost should just be one single DB access for reads, by definition of what it tries to account for. And two if we count in the writes that happen at the end of the chunk, when trie changes are applied.

Looking at it from that perspective, using the same SSD assumptions from above, we get a baseline like this:

Trie Nodes

  Reads Writes Time [us] with SSD Time [us] with current gas price
wasm_touching_trie_node 1 1 286 16.101955926

Per Byte storage cost

  Reads Writes Time [us] per 4kB with SSD Time [us] per kB with current gas price
wasm_storage_read_value_byte 1/4096   143 22.98267648
wasm_storage_write_value_byte   1/4096 143 127.051935744
wasm_storage_write_evicted_byte 1/4096   143 131.552489472
wasm_storage_remove_ret_value_byte 1/4096   143 47.233253376

Again, the baseline shows higher numbers than current gas costs. Again, we can say this might be okay since the DB is more efficient than doing one IOP per DB operation. Especially since we haven't succeeded in creating benchmarks that reproducibly are slower to execute than covered by gas costs. But this should motivate us to try again finding such benchmarks.

@jakmeier
Copy link
Contributor Author

Finally, for all remaining parameters, I am running estimations now and checking to check how many IO operations there actually are caused by them. The only remaining work here is to then pull out the numbers. I will post here when I have them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-params-estimator Area: runtime params estimator T-contract-runtime Team: issues relevant to the contract runtime team
Projects
None yet
Development

No branches or pull requests

3 participants
@jakmeier and others