Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VM] 2nd Performance optimization #33

Closed
brew0722 opened this issue Nov 24, 2020 · 7 comments
Closed

[VM] 2nd Performance optimization #33

brew0722 opened this issue Nov 24, 2020 · 7 comments
Assignees
Labels

Comments

@brew0722
Copy link
Contributor

No description provided.

@brew0722 brew0722 self-assigned this Nov 24, 2020
@brew0722 brew0722 added the VM label Nov 24, 2020
@brew0722
Copy link
Contributor Author

brew0722 commented Dec 11, 2020

First, after the first analysis of the singlepass code, we confirmed the possibility of optimizing the code size of emit_memory_op.

And analyzed that the static memory bound check patch is also valid for singlepass.

  • Minimum 27% ~ maximum 208% performance improvement compared to the highest performance before application (418txs → 532~870txs)
  • 286KB reducing JIT code cache size.
  • 1405KB(1.37MB) reducing CacheGen size.(related ExceptionTable)

assembly generation step

  • emit (mov reg_addr_base, mem_base)
    #IF IS_DYNAMIC
    • emit (mov reg_offset_bound, mem_bound)
    • emit (add reg_offset_bound, reg_addr_base) == reg_offset_bound changed to reg_addr_bound
    • emit (mov reg_offset_target, target_addr)
    • emit (add reg_offset_target, memarg.offset)
    • emit (add reg_offset_target, memarg.value_size) == reg_offset_target changed to reg_offset_target_bound
    • emit (add reg_offset_target_bound, reg_addr_base) == reg_offset_target_bound changed to reg_addr_target_bound
    • emit (cmp reg_addr_target_bound, reg_addr_bound)
    • emit (ja label_trap)
  • emit (mov reg_offset_target, target_addr)
  • emit (add reg_offset_target, memarg.offset)
  • emit (add reg_offset_target, mem_base) == reg_offset_target changed to reg_addr_target
    #IF CHECK_ALIGNMEJNT
    • emit (mov ...)
    • emit (and ...)
    • emit (jne label_tgrap)
  • call post generation callback

As a result of analysis, 7333 dynamic emits were performed in erc-20.
So, in the case of erc-20, the bound check inline assembly occupies 7333*40bytes = 293,320Bytes (286KB).
As a result, by not executing the bound check, the amount of code generated is reduced, and the speed is increased.

image

@brew0722
Copy link
Contributor Author

First, the first analysis is completed. Further analysis and actual poc will be do out later.

@brew0722
Copy link
Contributor Author

Unlike the previous meeting, plans related to VM apply are not in a hurry, so I will proceed with this work again.

@brew0722 brew0722 reopened this Dec 21, 2020
@brew0722
Copy link
Contributor Author

brew0722 commented Dec 21, 2020

init_local_remove_redurrent_repeat_mov.patch.zip

2nd patch. Assembly code that initializes the local stack is inefficient.
Generates a 1:1 mov assembly instruction per stack slot used.
For example, if the local stack range is ebp-8 to ebp-336, initializing 41 slots. Since a single mov instruction takes up 12 bytes, it takes up a total of 492 bytes.

This can reduce to code size of all stack size initialization with a fixed 24 bytes using the rep stosq command.
image

result, for erc20 reduced by 8KB.

@brew0722
Copy link
Contributor Author

#48

3rd patch. We analyzed the inefficiency of ExceptionTable structure and changed it to Offset Range.
Patch result, For erc20, the module cache file was successfully reduced by 612KB.

@brew0722
Copy link
Contributor Author

brew0722 commented Jan 5, 2021

#51

4th patch. copying wasm memory data from cosmwasm_vm::read_region, copy was implemented as assign for 0..n The cost of copying is severe.

Patching to use std::copy for raw pointers, the performance range has been fixed, and maximum performance has been improved.(min64%, max 7%)

1st performance measurement result - 876 ~ 913txs
Secondary performance measurement result - 868 ~ 963txs

@brew0722 brew0722 changed the title [VM] Listing suspicious performance overhead point [VM] 2nd Performance optimization Jan 5, 2021
@brew0722 brew0722 added this to the Performance Improvements milestone Jan 5, 2021
@brew0722 brew0722 closed this as completed Jan 7, 2021
@brew0722
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant