The project became dormant as IMHO YJIT is moving into the right direction and competition with YJIT is not reasonable anymore. Please feel free to use any ideas and code if you need this.
Update May 2022: sir-mirjit-base and sir-mirjit branches were merged with May 2023 ruby trunk.
-
The branch
sir-mirjit
is used for development of specialized VM insns (specialized IR or SIR in brief), faster CRuby interpreter and MIR-based JIT (MIRJIT in brief) based on SIR -
The last branch merge point with the trunk is always the head of the branch
sir-mirjit-base
- The branch
sir-mirjit
will be merged with the trunk from time to time and correspondingly the head of the branchsir-mirjit-base
will be the last merge point with the trunk
- The branch
- Specialized VM insns are generated dynamically in a lazy way and on VM insns BB level only. They consist of
- hybrid stack-RTL insns:
sir_{plus,minus,mult,div,mod,or,and,eq,neq,lt,gt,le,ge,aref,aset}s{s,v,i}{s,v,i}
where suffix `s` means value on stack, `v` means local variable value, `i` means immediate value.
Some suffix combinations (like `vii`, `sii`, `sss`) are not permitted.
- type-specialized insns:
sir_i{plus,minus,mult,div,mod,or,and,eq,neq,lt,gt,le,ge,aref,aset}{s,v}{s,v,i}{s,v,i}
sir_f{plus,minus,mult,div,mod,eq,neq,lt,gt,le,ge}s{s,v,i}{s,v,i}
sir_ib{eq,neq,lt,gt,le,ge}{s,v,i}{s,v,i}
where `i` after prefix `sir_` means fixnum insns, and `f` means flonum insns
- speculatively type-specialized insns (with guards):
sir_si{plus,minus,mult,div,mod,or,and,eq,neq,lt,gt,le,ge,aref,aset}s{s,v,i}{s,v,i}
sir_sf{plus,minus,mult,div,mod,eq,neq,lt,gt,le,ge}s{s,v,i}{s,v,i}
where `i` after prefix `sir_` means fixnum insns, and `f` means flonum insns
- insns for profiling (e.g.
sir_inspect_type
orsir_inspect_fixnum
) - different specialized insns for calls (e.g.
sir_iseq_send_without_block
orsir_cfunc_send
) - attribute manipulation (e.g.
sir_send_ivget
orsir_send_ivset
) - iterators (
sir_iter_{start,body,cont}
) - different rarely executed insns to start generation of the specialized IR (e.g
sir_make_bbv
) or jitting (e.g.sir_bbv_jit_call
). - More details about specialized insns can be found at the end of file
insns.def
- SIR generation related code can be found in file
sir_gen.c
- SIR execution related code can be found in file
sir_exec.c
andinsns.def
- MIRJIT code generation is in file
mirjit.c
- to build MIRJIT you need to build and install MIR-library from branch
bbv
of MIR project (https://github.com/vnmakarov/mir
)
- to build MIRJIT you need to build and install MIR-library from branch
- Normal IR execution flow:
- Start execution of BB with a stub
- Stub execution generates hybrid stack-based RTL insns and type-specialized insns with profiling insns
- Type specialized insns here are generated by basic block versioning (see article by Maxime Chevalier-Boisvert)
- Several executions of type-specialized insns results in type- and profile- specialized insns
- Type- and profile-specialized insns are a source of MIRJIT
- Exception IR execution flow:
- Switching to non-type specialized stack-based RTL
- The code is made public only for introducing CRuby developers to the project
- The code is only good enough to run optcarrot and micro-benchmarks from directory
sir-bench
- To run benchmarks:
- optional (for x86-64 or aarch64 mirjit): build and install MIR
bbv
branch with default install prefix (/usr/local
)- branch ``bbv` is in development and therefore can be unstable. Please use commit 0abe8498defc99f8a257588bd42b6e59f168cff7 which was used for benchmarking results below
- if you have already installed MIR in
/usr/local
and want to build ruby w/o mirjit, use option--without-mir
for rubyconfigure
- build ruby from this branch
- run microbenchmarks:
- optional (for x86-64 or aarch64 mirjit): build and install MIR
- To run benchmarks:
cd sir-bench
taskset -cpu-list <a cpu-number> ruby compare.rb "<list ruby benchmarks from sir-bench>" base:../miniruby sir:'../miniruby --sir' yjit:'../miniruby --yjit' ...
* use `taskset` as modern CPUs can have cores of different speed
* clone [optcarrot](https://github.com/mame/optcarrot) from github and run it:
cd sir-bench
../miniruby [--sir|--yjit|...] -v -Ilib -r<path-to-optcarrot>/tools/shim <path-to-optcarrot>/bin/optcarrot --benchmark [--opt] -f=3000 <path-to-optcarrot>/examples/Lan_Master.nes
- The project development has been stopped because competition with YJIT is not reasonable anymore
-
MIRJIT generates code for any frequently executed BBV while YJIT starts generation of BBVs of frequently executed method
-
MIRJIT generates C code which is translated into MIR-code and after that MIR is optimized and generated into machine code * YJIT generates machine code directly (although YJIT is moving to generating IR and then machine code)
- As a consequence YJIT has faster compilation speed than MIRJIT
- Implement code generation for more one BBVs for better code locality, removing some branches, and avoiding indirect branches:
- it can be generation code for all reachable BBVs of Ruby method
- it can be generation for BBVs trace, most frequently executable BBVs
- Implement polymorphic caches (caching more one called methods or instance variable access). YJIT already
implemented this
- this can improve performance of
red-black
,trees
,optcarrot
and some other benchmarks
- this can improve performance of
- Implement more VM insn generation
- currently MIRJIT stops code generation of BBV on the 1st unimplemented VM insn in the BBV
- Keep and use Ruby local variables in MIR vars besides stack slots. Most probably MIR vars will be kept in machine registers
- Avoid double switching
generated code -> SIR -> safe VM insn code
when the speculation assumptions are not held. Switch to the safe VM insn code directly from the generated code - Avoid writing values to VM-stack when we can guarantee that there will be no switch to the interpreter. Writing can be quite expensive as it requires to inform Ruby generational GC
- Implement FP boxing/unboxing optimization
- Generate directly MIR code instead of C code and using
c2m
- it could speed up code generation
- it could permit generate more efficient MIR code
- Improve code generated by MIR compiler itself (although it is more a todo for MIR-project)
- Decrease number of compilations by MIRJIT
- currently MIRJIT might generate code for several parts of BBV even if they will be not used
- decrease number of rejitting
- Better memory allocation for BBVs (with reallocation when more memory is necessary)
- currently
sir_gen.c
allocates very large (mostly unused memory) for each Ruby method
- currently
-
All measurements are done on Intel i5-13600K with 64GB memory under x86-64 Fedora Core36
-
I compared the following:
- base - the base interpreter (
miniruby
) - sir - the interpreter with specialized (
miniruby --sir
) - yjit - YJIT (
miniruby --yjit
) - mir - MIR-based JIT (
miniruby --mirjit
)
- base - the base interpreter (
-
I used the following micro-benchmarks (see sir-bench directory):
- aread - reading an instance variable through attr_reader
- aref - reading an array element
- aset - assignment to an array element
- awrite - assignment to an instance variable through attr_writer
- bench - rendering
- call - empty method calls
- complex-mandelbrot - complex mandelbrot
- const2 - reading Class::Const
- const - reading Const
- fannk - fannkuch
- fib - fibonacci
- ivread - reading an instance variable (@var)
- ivwrite - assignment to an instance variable
- mandelbrot - (non-complex) mandelbrot as CRuby v2 does not support complex numbers
- meteor - meteor puzzle
- nbody - modeling planet orbits
- nest-ntimes - nested ntimes loops (6 levels)
- nest-while - nested while loops (6 levels)
- norm - spectral norm
- pent - pentamino puzzle
- red-black - Red Black trees
- sieve - Eratosthenes sieve
- trees - binary trees
- while - while loop
-
Each benchmark ran 3 times and minimal time (or smallest maximum resident memory) was chosen
-
I also used optcarrot for more serious program performance comparison
- I used 3000 frames to run optcarrot
- Wall time speedup:
Elapsed time:
base | sir | yjit | mir | |
---|---|---|---|---|
aread.rb | 1.0 | 3.74 | 7.21 | 8.78 |
aref.rb | 1.0 | 3.76 | 5.06 | 9.24 |
aset.rb | 1.0 | 3.41 | 3.12 | 8.91 |
awrite.rb | 1.0 | 4.44 | 3.2 | 10.09 |
bench.rb | 1.0 | 1.19 | 1.63 | 1.17 |
call.rb | 1.0 | 2.1 | 4.82 | 4.87 |
complex-mandelbrot.rb | 1.0 | 1.16 | 1.48 | 1.16 |
const2.rb | 1.0 | 2.47 | 2.66 | 6.66 |
const.rb | 1.0 | 2.46 | 2.66 | 6.7 |
fannk.rb | 1.0 | 1.15 | 1.0 | 1.22 |
fib.rb | 1.0 | 1.94 | 5.63 | 3.86 |
ivread.rb | 1.0 | 2.19 | 6.22 | 3.54 |
ivwrite.rb | 1.0 | 2.86 | 5.57 | 5.44 |
mandelbrot.rb | 1.0 | 1.43 | 1.91 | 1.78 |
meteor.rb | 1.0 | 1.29 | 1.35 | 1.25 |
nbody.rb | 1.0 | 1.3 | 1.8 | 1.69 |
nest-ntimes.rb | 1.0 | 2.06 | 1.3 | 1.96 |
nest-while.rb | 1.0 | 3.53 | 0.99 | 10.24 |
norm.rb | 1.0 | 1.67 | 2.13 | 2.2 |
pent.rb | 1.0 | 1.11 | 1.31 | 0.9 |
red-black.rb | 1.0 | 1.39 | 3.84 | 1.77 |
sieve.rb | 1.0 | 2.41 | 1.25 | 3.15 |
trees.rb | 1.0 | 1.38 | 2.23 | 1.48 |
while.rb | 1.0 | 2.23 | 5.66 | 9.85 |
GeoMean. | 1.0 | 2.0 | 2.55 | 3.28 |
- CPU time improvements is approximately the same except MJIT which has lower CPU time improvement
- Geomean max resident memory increase relative to the base interpreter:
base | sir | yjit | mir | |
---|---|---|---|---|
GeoMean. | 1.0 | 1.21 | 1.03 | 2.41 |
- Frame per seconds (more is better):
base | sir | yjit | mir | |
---|---|---|---|---|
optcarrot | 82.4 | 105.0 | 262.1 | 123.1 |
optcarrot --opt | 211.5 | 356.4 | 232.9 | 416.4 |