-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmarks: add Taylor series for pi #482
Conversation
test/benchmarks/taylor.c
Outdated
sum = 4.0 * sum; | ||
|
||
// Display all 16 digits of double precision as a 64-bit integer | ||
return sum * 10000000000000000ULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I returned f64. Supporting that in wasm_engine/fizzy_engine is trivial, but changing the fizzy-bench parser seemed like a larger task so decided for this "workaround".
test/benchmarks/taylor.c
Outdated
WASM_EXPORT unsigned long long taylor(unsigned n) | ||
{ | ||
double sum = 1.0; | ||
int sign = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make this double too to have sign changing op, but this way there's an int to float conversion.
Alternatively make this float sign
so there's a promote instruction in the loop.
return sum; | ||
} | ||
|
||
WASM_EXPORT unsigned long long taylor_pi(unsigned n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this to avoid the need of f32/f64 in fizzy-bench.
a468409
to
020bb75
Compare
Codecov Report
@@ Coverage Diff @@
## master #482 +/- ##
=======================================
Coverage 99.67% 99.67%
=======================================
Files 54 54
Lines 17180 17180
=======================================
Hits 17125 17125
Misses 55 55 |
test/benchmarks/taylor_pi.inputs
Outdated
|
||
31415916535897744 | ||
|
||
pi_3000000_runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep single execution case for a start. Something that runs in milliseconds range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is on my machine:
fizzy/execute/taylor_pi/pi_1000000_runs 52600 us 51521 us 14
wabt/execute/taylor_pi/pi_1000000_runs 80834 us 79290 us 9
wasm3/execute/taylor_pi/pi_1000000_runs 12373 us 12234 us 58
fizzy/execute/taylor_pi/pi_3000000_runs 152631 us 150737 us 4
wabt/execute/taylor_pi/pi_3000000_runs 235717 us 233095 us 3
wasm3/execute/taylor_pi/pi_3000000_runs 36830 us 36516 us 19
What is your upper bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave the first one. The second runs over 2 seconds with sanitizers on CI.
(module
(type $t0 (func))
(type $t1 (func (param i32) (result i64)))
(func $__wasm_call_ctors (type $t0))
(func $taylor_pi (export "taylor_pi") (type $t1) (param $p0 i32) (result i64)
(local $l0 i64) (local $l1 i32) (local $l2 f64) (local $l3 f64)
i64.const 40000000000000000
set_local $l0
block $B0
block $B1
get_local $p0
i32.const 2
i32.lt_u
br_if $B1
get_local $p0
i32.const -1
i32.add
set_local $l1
f64.const 0x1p+0 (;=1;)
set_local $l2
i32.const -1
set_local $p0
f64.const 0x1p+0 (;=1;)
set_local $l3
loop $L2
get_local $l3
get_local $p0
f64.convert_s/i32
get_local $l2
get_local $l2
f64.add
f64.const 0x1p+0 (;=1;)
f64.add
f64.div
f64.add
set_local $l3
get_local $l2
f64.const 0x1p+0 (;=1;)
f64.add
set_local $l2
i32.const 0
get_local $p0
i32.sub
set_local $p0
get_local $l1
i32.const -1
i32.add
tee_local $l1
br_if $L2
end
end
get_local $l3
f64.const 0x1p+2 (;=4;)
f64.mul
f64.const 0x1.1c37937e08p+53 (;=1e+16;)
f64.mul
tee_local $l2
f64.const 0x1p+64 (;=1.84467e+19;)
f64.lt
get_local $l2
f64.const 0x0p+0 (;=0;)
f64.ge
i32.and
br_if $B0
i64.const 0
set_local $l0
end
get_local $l0
return
end
get_local $l2
i64.trunc_u/f64)
(table $T0 1 1 anyfunc)
(memory $memory (export "memory") 2)
(global $g0 (mut i32) (i32.const 66560))
(global $__heap_base (export "__heap_base") i32 (i32.const 66560))
(global $__data_end (export "__data_end") i32 (i32.const 1024))) |
Agreed to make this use single precision. |
Here's the single-precision verison: (module
(type $t0 (func))
(type $t1 (func (param i32) (result i64)))
(func $__wasm_call_ctors (type $t0))
(func $taylor_pi (export "taylor_pi") (type $t1) (param $p0 i32) (result i64)
(local $l0 i64) (local $l1 i32) (local $l2 i32) (local $l3 f32) (local $l4 f32)
i64.const 40000001090256896
set_local $l0
block $B0
block $B1
get_local $p0
i32.const 2
i32.lt_u
br_if $B1
i32.const -1
set_local $l1
i32.const 1
set_local $l2
f32.const 0x1p+0 (;=1;)
set_local $l3
loop $L2
get_local $l3
get_local $l1
f32.convert_s/i32
get_local $l2
f32.convert_u/i32
tee_local $l4
get_local $l4
f32.add
f32.const 0x1p+0 (;=1;)
f32.add
f32.div
f32.add
set_local $l3
i32.const 0
get_local $l1
i32.sub
set_local $l1
get_local $p0
get_local $l2
i32.const 1
i32.add
tee_local $l2
i32.ne
br_if $L2
end
get_local $l3
f32.const 0x1p+2 (;=4;)
f32.mul
f32.const 0x1.1c3794p+53 (;=1e+16;)
f32.mul
tee_local $l3
f32.const 0x1p+64 (;=1.84467e+19;)
f32.lt
get_local $l3
f32.const 0x0p+0 (;=0;)
f32.ge
i32.and
br_if $B0
i64.const 0
set_local $l0
end
get_local $l0
return
end
get_local $l3
i64.trunc_u/f32)
(table $T0 1 1 anyfunc)
(memory $memory (export "memory") 2)
(global $g0 (mut i32) (i32.const 66560))
(global $__heap_base (export "__heap_base") i32 (i32.const 66560))
(global $__data_end (export "__data_end") i32 (i32.const 1024))) |
Depends on #474.
This benchmark uses only f64 instructions: const, add, div, mul, lt, ge. And two conversions:
f64.convert_s/i32
andi64.trunc_u/f64
.