-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Do not merge] Reduce cache delay by borsh #3847
Conversation
185ms still feels long to me. This means that we cannot fit more than 5 calls to contracts of ~400kb in one block, which is quite suboptimal. cc @olonho |
Did you use |
No. Trying now |
@bowenwang1996 The difference in debug version was significant (466 vs 185ms), however, it is not a big difference in release version. I clean build, run a few times for each build to make sure i didn't build and run the incorrect version. before: after: The way i test it: obtain eth-client.wasm from rainbow repo, then
|
@bowenwang1996 what is your machine and contract to get:
|
Oh, i see you also have one time
so that's very volatile, we probably should rely on #3841 to measure. so far my result shows debug build gives performance gain, but in release build the performance increase is trivial, and probably won't fix the long delay in cache issue. Also, on my machine both before and after this PR, release build, deserialize wasmer is quite fast, maybe we should test it on cloud? which machine I can play with? |
I tried on a cloud instance (probably faster than our prod nodes, but slower than my local one), again the performance increase is minor for release build. this pr: |
The machine I used is e2-standard-4 on gcloud |
@bowenwang1996 cache=0 doesn't help, almost same as cache != 0, 16,14,14ms before, 13,11,12ms this PR |
Just sync with Max, he suggested that a complex structure borsh vs bincode won't save much time, simpler structure does. And we identify the btreemap may be the bottleneck of deserialization (to be confirmed), I found rust std btreemap doesn't have a "create_from_sorted_vector" method, instead, it's insert one by one so deserialize it is slow. Max said wasmer maybe not need a btreemap, a sorted vec/hashmap/heap may work. Max's work would need more investigation and optimization on wasmer side, so i plan to first:
|
Observations from Max and me is correct. Majority time is spent on deserialize HashMap and BTreeMap in wasmer's CacheImage. if replace all HashMap and BTreeMap<X,Y> with a Vec<(X,Y)> (they're serialized in same way in borsh), a release build to deserialize CacheImage reduce from 10ms to 3.36ms. I would expect similar percentage of performance gain on our server. So, our borsh deserialize HashMap and BTreeMap is too slow, it first deserialize as vector and insert one by one. Given rust std HashMap/BTreeMap doesn't have batch insert/initialization, we should consider replace them with a more performant alternative, or some op in raw c pointer like |
Good job investigating. I don't think we can batch initialize hash_map with raw data, since it probably does not occupy a contiguous area in memory. |
I looked further into this, hashmap (look it's Clone implementation) does use a contiguous memory as long as k,v are @evgenykuzyakov suggest a faster fix, both in execution and effort, that we don't serialize that hashmap at all, (it's okay because we don't need to know the kind of wasmer exception, we always return a WasmerUnknownError for that case) This makes the overall ~15ms entire deserialize wasmer down to ~6ms, which is probably good enough, so this PR is done. |
This PR has been automatically marked as stale because it has not had recent activity in the 2 weeks. |
@ailisp what is the status of this PR? |
@bowenwang1996 supersede with #4448 with added backward compatible changes |
Use borsh deserialized wasmer 0.x, which provides about 30% time saving in wasmer's several testing contract bench and 80% in near evm contract, and other contract bench shows 25%-50% improve on a cloud instance: #3847. This uses a pre release wasmer-near. Will replace with a formal released wasmer-near after it's released Test Plan --------- - CI - wasmer's bench code which covers the code path to ser/de cache with borsh - vm standalone test: evm_slow_deserialize_repro, test with no_cache feature (means no in memory cache), which will do a full serialize to disk cache, load and deserialize with borsh and contract call
DO NOT MERGE due to undo on disk cache in the up comming release.
This PR reduce deserialize_wasmer significantly from 23ms to 10.5ms in e2-standard-n2 instance.
And deserialize_wasmer is the major bottleneck in https://github.com/near/nearcore/security/advisories/GHSA-q39c-4p3r-qpv3.
Test Plan
Deploy a sufficiently large contract, such as rainbow-bridge eth_client.wasm, and call contract multiple times, add a delay detector, set delay threshold to be 5ms instead of 50, observe time spent in deserialize_wasmer