Skip to content
/ ruby Public
forked from ruby/ruby

The Ruby Programming Language

License

Unknown, Unknown licenses found

Licenses found

Unknown
COPYING
Unknown
COPYING.ja
Notifications You must be signed in to change notification settings

vnmakarov/ruby

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The project became dormant as IMHO YJIT is moving into the right direction and competition with YJIT is not reasonable anymore. Please feel free to use any ideas and code if you need this.

Update May 2022: sir-mirjit-base and sir-mirjit branches were merged with May 2023 ruby trunk.

What's the branch about

  • The branch sir-mirjit is used for development of specialized VM insns (specialized IR or SIR in brief), faster CRuby interpreter and MIR-based JIT (MIRJIT in brief) based on SIR

  • The last branch merge point with the trunk is always the head of the branch sir-mirjit-base

    • The branch sir-mirjit will be merged with the trunk from time to time and correspondingly the head of the branch sir-mirjit-base will be the last merge point with the trunk

Specialized VM insns

  • Specialized VM insns are generated dynamically in a lazy way and on VM insns BB level only. They consist of
    • hybrid stack-RTL insns:
        sir_{plus,minus,mult,div,mod,or,and,eq,neq,lt,gt,le,ge,aref,aset}s{s,v,i}{s,v,i}
where suffix `s` means value on stack, `v` means local variable value, `i` means immediate value.
Some suffix combinations (like `vii`, `sii`, `sss`) are not permitted.
  • type-specialized insns:
        sir_i{plus,minus,mult,div,mod,or,and,eq,neq,lt,gt,le,ge,aref,aset}{s,v}{s,v,i}{s,v,i}
        sir_f{plus,minus,mult,div,mod,eq,neq,lt,gt,le,ge}s{s,v,i}{s,v,i}
        sir_ib{eq,neq,lt,gt,le,ge}{s,v,i}{s,v,i}
where `i` after prefix `sir_` means fixnum insns, and `f` means flonum insns
  • speculatively type-specialized insns (with guards):
        sir_si{plus,minus,mult,div,mod,or,and,eq,neq,lt,gt,le,ge,aref,aset}s{s,v,i}{s,v,i}
        sir_sf{plus,minus,mult,div,mod,eq,neq,lt,gt,le,ge}s{s,v,i}{s,v,i}
where `i` after prefix `sir_` means fixnum insns, and `f` means flonum insns
  • insns for profiling (e.g. sir_inspect_type or sir_inspect_fixnum)
  • different specialized insns for calls (e.g. sir_iseq_send_without_block or sir_cfunc_send)
  • attribute manipulation (e.g. sir_send_ivget or sir_send_ivset)
  • iterators (sir_iter_{start,body,cont})
  • different rarely executed insns to start generation of the specialized IR (e.g sir_make_bbv) or jitting (e.g. sir_bbv_jit_call).
  • More details about specialized insns can be found at the end of file insns.def

Project related files

  • SIR generation related code can be found in file sir_gen.c
  • SIR execution related code can be found in file sir_exec.c and insns.def
  • MIRJIT code generation is in file mirjit.c
    • to build MIRJIT you need to build and install MIR-library from branch bbv of MIR project (https://github.com/vnmakarov/mir)

SIR flow

IR flow

  • Normal IR execution flow:
    • Start execution of BB with a stub
    • Stub execution generates hybrid stack-based RTL insns and type-specialized insns with profiling insns
      • Type specialized insns here are generated by basic block versioning (see article by Maxime Chevalier-Boisvert)
    • Several executions of type-specialized insns results in type- and profile- specialized insns
    • Type- and profile-specialized insns are a source of MIRJIT
  • Exception IR execution flow:
    • Switching to non-type specialized stack-based RTL

Current state of the project

  • The code is made public only for introducing CRuby developers to the project
  • The code is only good enough to run optcarrot and micro-benchmarks from directory sir-bench
    • To run benchmarks:
      • optional (for x86-64 or aarch64 mirjit): build and install MIR bbv branch with default install prefix (/usr/local)
        • branch ``bbv` is in development and therefore can be unstable. Please use commit 0abe8498defc99f8a257588bd42b6e59f168cff7 which was used for benchmarking results below
        • if you have already installed MIR in /usr/local and want to build ruby w/o mirjit, use option --without-mir for ruby configure
      • build ruby from this branch
      • run microbenchmarks:
      cd sir-bench
      taskset -cpu-list <a cpu-number> ruby compare.rb "<list ruby benchmarks from sir-bench>" base:../miniruby sir:'../miniruby --sir' yjit:'../miniruby --yjit' ...
  * use `taskset` as modern CPUs can have cores of different speed
* clone [optcarrot](https://github.com/mame/optcarrot) from github and run it:
      cd sir-bench
      ../miniruby [--sir|--yjit|...]  -v -Ilib -r<path-to-optcarrot>/tools/shim <path-to-optcarrot>/bin/optcarrot --benchmark [--opt] -f=3000 <path-to-optcarrot>/examples/Lan_Master.nes
  • The project development has been stopped because competition with YJIT is not reasonable anymore

Major differences between MIRJIT and YJIT

  • MIRJIT generates code for any frequently executed BBV while YJIT starts generation of BBVs of frequently executed method

  • MIRJIT generates C code which is translated into MIR-code and after that MIR is optimized and generated into machine code * YJIT generates machine code directly (although YJIT is moving to generating IR and then machine code)

    • As a consequence YJIT has faster compilation speed than MIRJIT

Ideas to improve MIRJIT generated code more

  • Implement code generation for more one BBVs for better code locality, removing some branches, and avoiding indirect branches:
    • it can be generation code for all reachable BBVs of Ruby method
    • it can be generation for BBVs trace, most frequently executable BBVs
  • Implement polymorphic caches (caching more one called methods or instance variable access). YJIT already implemented this
    • this can improve performance of red-black, trees, optcarrot and some other benchmarks
  • Implement more VM insn generation
    • currently MIRJIT stops code generation of BBV on the 1st unimplemented VM insn in the BBV
  • Keep and use Ruby local variables in MIR vars besides stack slots. Most probably MIR vars will be kept in machine registers
  • Avoid double switching generated code -> SIR -> safe VM insn code when the speculation assumptions are not held. Switch to the safe VM insn code directly from the generated code
  • Avoid writing values to VM-stack when we can guarantee that there will be no switch to the interpreter. Writing can be quite expensive as it requires to inform Ruby generational GC
  • Implement FP boxing/unboxing optimization
  • Generate directly MIR code instead of C code and using c2m
    • it could speed up code generation
    • it could permit generate more efficient MIR code
  • Improve code generated by MIR compiler itself (although it is more a todo for MIR-project)
  • Decrease number of compilations by MIRJIT
    • currently MIRJIT might generate code for several parts of BBV even if they will be not used
    • decrease number of rejitting
  • Better memory allocation for BBVs (with reallocation when more memory is necessary)
    • currently sir_gen.c allocates very large (mostly unused memory) for each Ruby method

The current performance SIR interpreter and MIRJIT

  • All measurements are done on Intel i5-13600K with 64GB memory under x86-64 Fedora Core36

  • I compared the following:

    • base - the base interpreter (miniruby)
    • sir - the interpreter with specialized (miniruby --sir)
    • yjit - YJIT (miniruby --yjit)
    • mir - MIR-based JIT (miniruby --mirjit)
  • I used the following micro-benchmarks (see sir-bench directory):

    • aread - reading an instance variable through attr_reader
    • aref - reading an array element
    • aset - assignment to an array element
    • awrite - assignment to an instance variable through attr_writer
    • bench - rendering
    • call - empty method calls
    • complex-mandelbrot - complex mandelbrot
    • const2 - reading Class::Const
    • const - reading Const
    • fannk - fannkuch
    • fib - fibonacci
    • ivread - reading an instance variable (@var)
    • ivwrite - assignment to an instance variable
    • mandelbrot - (non-complex) mandelbrot as CRuby v2 does not support complex numbers
    • meteor - meteor puzzle
    • nbody - modeling planet orbits
    • nest-ntimes - nested ntimes loops (6 levels)
    • nest-while - nested while loops (6 levels)
    • norm - spectral norm
    • pent - pentamino puzzle
    • red-black - Red Black trees
    • sieve - Eratosthenes sieve
    • trees - binary trees
    • while - while loop
  • Each benchmark ran 3 times and minimal time (or smallest maximum resident memory) was chosen

  • I also used optcarrot for more serious program performance comparison

    • I used 3000 frames to run optcarrot

Micro-benchmark results (May 11, 2023)

  • Wall time speedup:

Elapsed time:

base sir yjit mir
aread.rb 1.0 3.74 7.21 8.78
aref.rb 1.0 3.76 5.06 9.24
aset.rb 1.0 3.41 3.12 8.91
awrite.rb 1.0 4.44 3.2 10.09
bench.rb 1.0 1.19 1.63 1.17
call.rb 1.0 2.1 4.82 4.87
complex-mandelbrot.rb 1.0 1.16 1.48 1.16
const2.rb 1.0 2.47 2.66 6.66
const.rb 1.0 2.46 2.66 6.7
fannk.rb 1.0 1.15 1.0 1.22
fib.rb 1.0 1.94 5.63 3.86
ivread.rb 1.0 2.19 6.22 3.54
ivwrite.rb 1.0 2.86 5.57 5.44
mandelbrot.rb 1.0 1.43 1.91 1.78
meteor.rb 1.0 1.29 1.35 1.25
nbody.rb 1.0 1.3 1.8 1.69
nest-ntimes.rb 1.0 2.06 1.3 1.96
nest-while.rb 1.0 3.53 0.99 10.24
norm.rb 1.0 1.67 2.13 2.2
pent.rb 1.0 1.11 1.31 0.9
red-black.rb 1.0 1.39 3.84 1.77
sieve.rb 1.0 2.41 1.25 3.15
trees.rb 1.0 1.38 2.23 1.48
while.rb 1.0 2.23 5.66 9.85
GeoMean. 1.0 2.0 2.55 3.28

  • CPU time improvements is approximately the same except MJIT which has lower CPU time improvement

  • Geomean max resident memory increase relative to the base interpreter:
base sir yjit mir
GeoMean. 1.0 1.21 1.03 2.41

Optcarrot results

  • Frame per seconds (more is better):
base sir yjit mir
optcarrot 82.4 105.0 262.1 123.1
optcarrot --opt 211.5 356.4 232.9 416.4

About

The Ruby Programming Language

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
COPYING
Unknown
COPYING.ja

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 62.8%
  • C 27.5%
  • C++ 2.8%
  • Yacc 2.3%
  • Rust 1.9%
  • Makefile 1.9%
  • Other 0.8%