[VM] 2nd Performance optimization #33

brew0722 · 2020-11-24T01:17:19Z

No description provided.

brew0722 · 2020-12-11T05:26:16Z

First, after the first analysis of the singlepass code, we confirmed the possibility of optimizing the code size of emit_memory_op.

And analyzed that the static memory bound check patch is also valid for singlepass.

Minimum 27% ~ maximum 208% performance improvement compared to the highest performance before application (418txs → 532~870txs)
286KB reducing JIT code cache size.
1405KB(1.37MB) reducing CacheGen size.(related ExceptionTable)

assembly generation step

emit (mov reg_addr_base, mem_base)
#IF IS_DYNAMIC
- emit (mov reg_offset_bound, mem_bound)
- emit (add reg_offset_bound, reg_addr_base) == reg_offset_bound changed to reg_addr_bound
- emit (mov reg_offset_target, target_addr)
- emit (add reg_offset_target, memarg.offset)
- emit (add reg_offset_target, memarg.value_size) == reg_offset_target changed to reg_offset_target_bound
- emit (add reg_offset_target_bound, reg_addr_base) == reg_offset_target_bound changed to reg_addr_target_bound
- emit (cmp reg_addr_target_bound, reg_addr_bound)
- emit (ja label_trap)
emit (mov reg_offset_target, target_addr)
emit (add reg_offset_target, memarg.offset)
emit (add reg_offset_target, mem_base) == reg_offset_target changed to reg_addr_target
#IF CHECK_ALIGNMEJNT
- emit (mov ...)
- emit (and ...)
- emit (jne label_tgrap)
call post generation callback

As a result of analysis, 7333 dynamic emits were performed in erc-20.
So, in the case of erc-20, the bound check inline assembly occupies 7333*40bytes = 293,320Bytes (286KB).
As a result, by not executing the bound check, the amount of code generated is reduced, and the speed is increased.

brew0722 · 2020-12-11T05:30:55Z

First, the first analysis is completed. Further analysis and actual poc will be do out later.

brew0722 · 2020-12-21T02:08:42Z

Unlike the previous meeting, plans related to VM apply are not in a hurry, so I will proceed with this work again.

brew0722 · 2020-12-21T02:34:47Z

init_local_remove_redurrent_repeat_mov.patch.zip

2nd patch. Assembly code that initializes the local stack is inefficient.
Generates a 1:1 mov assembly instruction per stack slot used.
For example, if the local stack range is ebp-8 to ebp-336, initializing 41 slots. Since a single mov instruction takes up 12 bytes, it takes up a total of 492 bytes.

This can reduce to code size of all stack size initialization with a fixed 24 bytes using the rep stosq command.

result, for erc20 reduced by 8KB.

brew0722 · 2020-12-23T06:14:54Z

#48

3rd patch. We analyzed the inefficiency of ExceptionTable structure and changed it to Offset Range.
Patch result, For erc20, the module cache file was successfully reduced by 612KB.

brew0722 · 2021-01-05T01:00:45Z

#51

4th patch. copying wasm memory data from cosmwasm_vm::read_region, copy was implemented as assign for 0..n The cost of copying is severe.

Patching to use std::copy for raw pointers, the performance range has been fixed, and maximum performance has been improved.(min64%, max 7%)

1st performance measurement result - 876 ~ 913txs
Secondary performance measurement result - 868 ~ 963txs

brew0722 · 2021-07-15T04:46:40Z

Closes:
wasmerio/wasmer#2012
wasmerio/wasmer#2030
CosmWasm/cosmwasm#730

brew0722 self-assigned this Nov 24, 2020

brew0722 added the VM label Nov 24, 2020

brew0722 closed this as completed Dec 11, 2020

brew0722 reopened this Dec 21, 2020

brew0722 changed the title ~~[VM] Listing suspicious performance overhead point~~ [VM] 2nd Performance optimization Jan 5, 2021

brew0722 added this to the Performance Improvements milestone Jan 5, 2021

brew0722 closed this as completed Jan 7, 2021

brew0722 mentioned this issue Jan 7, 2021

Apply the performance improvement patches #53

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VM] 2nd Performance optimization #33

[VM] 2nd Performance optimization #33

brew0722 commented Nov 24, 2020

brew0722 commented Dec 11, 2020 •

edited

Loading

brew0722 commented Dec 11, 2020

brew0722 commented Dec 21, 2020

brew0722 commented Dec 21, 2020 •

edited

Loading

brew0722 commented Dec 23, 2020

brew0722 commented Jan 5, 2021 •

edited

Loading

brew0722 commented Jul 15, 2021

[VM] 2nd Performance optimization #33

[VM] 2nd Performance optimization #33

Comments

brew0722 commented Nov 24, 2020

brew0722 commented Dec 11, 2020 • edited Loading

assembly generation step

brew0722 commented Dec 11, 2020

brew0722 commented Dec 21, 2020

brew0722 commented Dec 21, 2020 • edited Loading

brew0722 commented Dec 23, 2020

brew0722 commented Jan 5, 2021 • edited Loading

brew0722 commented Jul 15, 2021

brew0722 commented Dec 11, 2020 •

edited

Loading

brew0722 commented Dec 21, 2020 •

edited

Loading

brew0722 commented Jan 5, 2021 •

edited

Loading