[Do not merge] Reduce cache delay by borsh #3847

ailisp · 2021-01-23T02:24:42Z

DO NOT MERGE due to undo on disk cache in the up comming release.

This PR reduce deserialize_wasmer significantly from 23ms to 10.5ms in e2-standard-n2 instance.
And deserialize_wasmer is the major bottleneck in https://github.com/near/nearcore/security/advisories/GHSA-q39c-4p3r-qpv3.

Test Plan

Deploy a sufficiently large contract, such as rainbow-bridge eth_client.wasm, and call contract multiple times, add a delay detector, set delay threshold to be 5ms instead of 50, observe time spent in deserialize_wasmer

bowenwang1996 · 2021-01-25T18:30:20Z

185ms still feels long to me. This means that we cannot fit more than 5 calls to contracts of ~400kb in one block, which is quite suboptimal. cc @olonho

bowenwang1996 · 2021-01-25T18:34:05Z

This was ~466ms on my machine

Did you use --release? 466ms feels longer than the time it takes to compile the contract

runtime/near-vm-runner/src/cache.rs

runtime/near-vm-runner/src/wasmer_runner.rs

ailisp · 2021-01-25T19:35:28Z

This was ~466ms on my machine

Did you use --release? 466ms feels longer than the time it takes to compile the contract

No. Trying now

ailisp · 2021-01-25T22:31:24Z

@bowenwang1996 The difference in debug version was significant (466 vs 185ms), however, it is not a big difference in release version. I clean build, run a few times for each build to make sure i didn't build and run the incorrect version.

before:
Jan 25 14:28:13.501 INFO delay_detector: Took 16.466911ms processing in deserialize_wasmer get cache

after:
Jan 25 13:27:17.814 INFO delay_detector: Took 12.179094ms processing in deserialize_wasmer get cache

The way i test it: obtain eth-client.wasm from rainbow repo, then

export NEAR_ENV=local
near deploy test.near ~/eth_client.wasm
near call test.near initialized --accountId test.near --args ''
near call test.near initialized --accountId test.near --args ''
near call test.near initialized --accountId test.near --args '' # a few times

ailisp · 2021-01-25T22:36:59Z

@bowenwang1996 what is your machine and contract to get:

Jan 16 02:58:27.875  INFO delay_detector: Took 128.962978ms processing deserialize_wasmer

ailisp · 2021-01-25T22:43:01Z

Oh, i see you also have one time

Jan 16 02:58:27.554  INFO delay_detector: Took 54.682015ms processing deserialize_wasmer

so that's very volatile, we probably should rely on #3841 to measure. so far my result shows debug build gives performance gain, but in release build the performance increase is trivial, and probably won't fix the long delay in cache issue. Also, on my machine both before and after this PR, release build, deserialize wasmer is quite fast, maybe we should test it on cloud? which machine I can play with?

ailisp · 2021-01-26T02:41:17Z

I tried on a cloud instance (probably faster than our prod nodes, but slower than my local one), again the performance increase is minor for release build.
Before:
38 23 21ms

this pr:
20 18 18ms

bowenwang1996 · 2021-01-26T05:44:27Z

so that's very volatile, we probably should rely on #3841 to measure. so far my result shows debug build gives performance gain, but in release build the performance increase is trivial, and probably won't fix the long delay in cache issue. Also, on my machine both before and after this PR, release build, deserialize wasmer is quite fast, maybe we should test it on cloud? which machine I can play with?

The machine I used is e2-standard-4 on gcloud

bowenwang1996 · 2021-01-26T05:46:49Z

@ailisp you might be hitting rocksdb cache. Can you try to disable the cache for that column? I think setting this line to 0 should do it

nearcore/core/store/src/db.rs

Line 489 in 17701a2

let cache_size = 1024 * 1024 * 32;

ailisp · 2021-01-26T18:10:20Z

@bowenwang1996 cache=0 doesn't help, almost same as cache != 0, 16,14,14ms before, 13,11,12ms this PR

ailisp · 2021-01-26T19:42:04Z

Just sync with Max, he suggested that a complex structure borsh vs bincode won't save much time, simpler structure does. And we identify the btreemap may be the bottleneck of deserialization (to be confirmed), I found rust std btreemap doesn't have a "create_from_sorted_vector" method, instead, it's insert one by one so deserialize it is slow. Max said wasmer maybe not need a btreemap, a sorted vec/hashmap/heap may work. Max's work would need more investigation and optimization on wasmer side, so i plan to first:

identify the borsh deserialization bottleneck of wasmer's CacheImage
if it's really btreemap, see whether it can be accelerated with a third party btreemap which has some "create_from_sorted_vector" method

ailisp · 2021-01-27T00:05:57Z

Observations from Max and me is correct. Majority time is spent on deserialize HashMap and BTreeMap in wasmer's CacheImage. if replace all HashMap and BTreeMap<X,Y> with a Vec<(X,Y)> (they're serialized in same way in borsh), a release build to deserialize CacheImage reduce from 10ms to 3.36ms. I would expect similar percentage of performance gain on our server. So, our borsh deserialize HashMap and BTreeMap is too slow, it first deserialize as vector and insert one by one. Given rust std HashMap/BTreeMap doesn't have batch insert/initialization, we should consider replace them with a more performant alternative, or some op in raw c pointer like hash_map.to_raw_bytes() / unsafe hash_map.from_raw_bytes()

MaksymZavershynskyi · 2021-01-27T19:44:23Z

Good job investigating. I don't think we can batch initialize hash_map with raw data, since it probably does not occupy a contiguous area in memory.

ailisp · 2021-01-28T20:30:26Z

Good job investigating. I don't think we can batch initialize hash_map with raw data, since it probably does not occupy a contiguous area in memory.

I looked further into this, hashmap (look it's Clone implementation) does use a contiguous memory as long as k,v are Copy, which is the case here. btreemap doesn't, but batch insertion from sorted array would reduce time from O(nlgn) to O(n).

@evgenykuzyakov suggest a faster fix, both in execution and effort, that we don't serialize that hashmap at all, (it's okay because we don't need to know the kind of wasmer exception, we always return a WasmerUnknownError for that case) This makes the overall ~15ms entire deserialize wasmer down to ~6ms, which is probably good enough, so this PR is done.

runtime/near-vm-runner/src/cache.rs

stale · 2021-07-01T07:28:03Z

This PR has been automatically marked as stale because it has not had recent activity in the 2 weeks.
It will be closed in 3 days if no further activity occurs.
Thank you for your contributions.

bowenwang1996 · 2021-07-01T18:41:28Z

@ailisp what is the status of this PR?

ailisp · 2021-07-05T01:53:53Z

@bowenwang1996 supersede with #4448 with added backward compatible changes

Use borsh deserialized wasmer 0.x, which provides about 30% time saving in wasmer's several testing contract bench and 80% in near evm contract, and other contract bench shows 25%-50% improve on a cloud instance: #3847. This uses a pre release wasmer-near. Will replace with a formal released wasmer-near after it's released Test Plan --------- - CI - wasmer's bench code which covers the code path to ser/de cache with borsh - vm standalone test: evm_slow_deserialize_repro, test with no_cache feature (means no in memory cache), which will do a full serialize to disk cache, load and deserialize with borsh and contract call

ailisp added 2 commits January 22, 2021 09:40

debug cache delay

2aadb06

185ms deserialization_wasmer after replace bincode with borsh, was 466ms

7ec1f18

olonho reviewed Jan 25, 2021

View reviewed changes

runtime/near-vm-runner/src/cache.rs Outdated Show resolved Hide resolved

olonho reviewed Jan 25, 2021

View reviewed changes

runtime/near-vm-runner/src/wasmer_runner.rs Show resolved Hide resolved

detect <50ms delay

b867002

ailisp added 2 commits January 25, 2021 17:18

merge master

98d1ee3

use new near wasmer

3f854a3

ailisp added 2 commits January 28, 2021 14:35

new near wasmer, not serialize exception_map

bbfce65

undo debug print and delaydetector

6b26374

ailisp changed the title ~~Reduce cache delay by borsh~~ [Do not merge] Reduce cache delay by borsh Jan 28, 2021

ailisp marked this pull request as ready for review January 28, 2021 22:55

olonho reviewed Feb 1, 2021

View reviewed changes

runtime/near-vm-runner/src/cache.rs Outdated Show resolved Hide resolved

unusd import

2b219b7

stale bot added the S-stale label Jul 1, 2021

stale bot removed the S-stale label Jul 1, 2021

ailisp closed this Jul 5, 2021

ailisp deleted the debug-cache-delay branch July 5, 2021 01:54

ailisp mentioned this pull request Jul 20, 2021

feat(runtime): borsh deserialized wasmer 0.x #4448

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do not merge] Reduce cache delay by borsh #3847

[Do not merge] Reduce cache delay by borsh #3847

ailisp commented Jan 23, 2021 •

edited

Loading

bowenwang1996 commented Jan 25, 2021

bowenwang1996 commented Jan 25, 2021

ailisp commented Jan 25, 2021

ailisp commented Jan 25, 2021

ailisp commented Jan 25, 2021

ailisp commented Jan 25, 2021

ailisp commented Jan 26, 2021

bowenwang1996 commented Jan 26, 2021

bowenwang1996 commented Jan 26, 2021

ailisp commented Jan 26, 2021

ailisp commented Jan 26, 2021

ailisp commented Jan 27, 2021

MaksymZavershynskyi commented Jan 27, 2021

ailisp commented Jan 28, 2021

stale bot commented Jul 1, 2021

bowenwang1996 commented Jul 1, 2021

ailisp commented Jul 5, 2021

[Do not merge] Reduce cache delay by borsh #3847

[Do not merge] Reduce cache delay by borsh #3847

Conversation

ailisp commented Jan 23, 2021 • edited Loading

Test Plan

bowenwang1996 commented Jan 25, 2021

bowenwang1996 commented Jan 25, 2021

ailisp commented Jan 25, 2021

ailisp commented Jan 25, 2021

ailisp commented Jan 25, 2021

ailisp commented Jan 25, 2021

ailisp commented Jan 26, 2021

bowenwang1996 commented Jan 26, 2021

bowenwang1996 commented Jan 26, 2021

ailisp commented Jan 26, 2021

ailisp commented Jan 26, 2021

ailisp commented Jan 27, 2021

MaksymZavershynskyi commented Jan 27, 2021

ailisp commented Jan 28, 2021

stale bot commented Jul 1, 2021

bowenwang1996 commented Jul 1, 2021

ailisp commented Jul 5, 2021

ailisp commented Jan 23, 2021 •

edited

Loading