vectorized activation vm #1008

kali · 2023-03-21T22:35:55Z

This is a side-project I'm working on during live-coding sessions on twitch. Primary time for streams is Monday evenings (8:30 CET or so) on https://www.twitch.tv/kalicoding . The unedited videos are also https://www.youtube.com/playlist?list=PL2L_7bZXXxQuNpcxjg8gipegP32pmwVZF .

Landing page for twitch with some explanations is here. https://github.com/kali/coding .

kali · 2023-03-22T06:51:11Z

We are interested in Element wise operations:
- result is independent of tensor geometry (only accept scalar operations, no broadcasting of tensors)
- output = f(x) for x in input
"simple ops + if/then/else"
- Relu(x) = max(0, x)
- Affine = alpha * x + beta
- LeakyRelu = if x >= 0 { x } else { alpha * x }
- ThresholdRelu = if x >= alpha { x } else { 0 }
- HardSigmoid = min(max(alpha * x + beta), 0, 1)
- Sofsign = x / (1 + abs(x))
- HardSwish = x * max(0(min(1, alpha * x + beta))) (alpha = 1/6 and beta 1/2)

Proposed virtual machine def

4 (?) big registers (mapped to one or several hardware vector registers)
- on armv8, one big register could be 4 (or 5?) NEON registers (44 f32 values or 48 f16 values)
- if 16 NEON registers are used, it leaves 16 for housekeeping (pre-fetch caching + operators)
focusing on the "simple" activation functions
- BigRegs are a,b,c,d
- min, max, abs, +, -, *, /, ite, ifpos
- constants
calling convention: framework preloads x into a, expects y in a

Moves, load consts to any reg, unary, binary, ternary on a,b,c only

we want to limit the combination of op*operands
moves: 12 moves ops (from 4 registers to 3)
other ops have all fixed registers: unary ops a <- a, binary a <- a#b, ternary a <- a#b#c
Relu: load(b, 0) | max
Affine: load(b, alpha) | mul | load(v, beta) | add
LeakyRelu: b <- a | load(a, alpha) | mul | c <- a | ifpos
ThresholdRelu: c <- a | load(b, alpha) | sub | b <- c | load(c, 0) | ifpos
Softsign: c <- a | abs | load(b, 1) | add | recip | b <- c | mul
HardSwish: load(b, alpha) | mul | load(b, beta) | add | load(b, 1) | min | load(b, 0) | max

Better spec:

what if we add: a <- a#K (K constant)
Relu: max(0)
Affine: mul(alpha) | add(beta)
LeakyRelu: b <- a | mul(alpha) | c <- a | ifpos
ThresholdRelu: b <- a | sub(alpha) | load(c, 0) | ifpos
Softsign: b <- a | abs | add(1) | recip | mul
HardSwish: mul(alpha) | add(beta) | min(0) | max(1)

Much better :)

What about other activation functions ?

Some are implemented in tract with rational approximations
- P(x)/Q(x) where P and Q are polys of x
- Some can be expressed from e^x:
  - Tanh(x) - (1 - e(-2x)/(1 + e(-2x))
  - Sigmoid(x) - 1/(1 + e(-x))
- But it is faster to use rational approx. Tanh and Sigmoid have handcrafted vectorized assembly impls.
- Q: Can we extend our VM to support rational approx ?
- Erf can only be computed from rational approx (as it is less used than tanh and sigmoid, it just have a rust implementation).
More activation functions computed from exp (and also log, tanh)
- implemented as separate expansion to several tract operators
- should we look for approximation ? rational or poly ?
- can we / should we add e^x as a primitive ?
  - can we do as well / better than default impl ?
- ScaledTanh = alpha * tanh(beta * x) (should be easily derived from tanh)
- Softplus = log(1 + e^x)
- Celu = max(0,x) + min(0,alpha * exp(x / alpha) - 1)
- Elu = if x < 0 { alpha * (exp(x) - 1) } else { x }
- Selu = if x < 0 { gamma * (alpha * e^x - alpha) } else { gamma * x }
- Mish(x) = x * tanh(softplus(x)) = x * tanh(ln(1 * e^x))

kali marked this pull request as draft March 22, 2023 06:49

kali force-pushed the activation-vm branch from 02172ca to 693a205 Compare March 28, 2023 18:53

kali force-pushed the activation-vm branch from 1d127c1 to 0478196 Compare April 10, 2023 18:44

kali changed the title ~~beginning activation vm~~ vectorized activation vm Apr 17, 2023

kali force-pushed the activation-vm branch from 4a86ebf to 08e0c52 Compare April 23, 2023 14:20

kali force-pushed the activation-vm branch from 7ff280e to 482f8e8 Compare May 9, 2023 06:51

kali added 23 commits June 5, 2023 20:24

beginning activation vm

1c768ee

wip, sigmoid and exp functions

7e2cb7c

cleanup

bb8e887

benches first commit

2d2c360

bench several funcs

c185a18

vectorizing computation

cfb20a6

split ew helper from ew

e563e65

moving everything inside tract

9ce7000

wip impl arm64simd activ

ed17559

wip, broken max const

7949402

fixes to max_const

3dc125f

move constant into op stream

dce1b86

read two instruction slots in the main loop

0062354

automagic jump table generation

0145ec7

better test expression

de1459a

affine, add_const, mul_const

865a1bc

broken wip

c4dc790

hard sigmoid ok

2e5d696

hardswish ok

244bd95

threshold, softsign

904c652

missing ops

5fde35f

benches.

be9d8cc

sigmoid

de1829d

kali force-pushed the activation-vm branch from 3b7a7d2 to de1829d Compare June 5, 2023 18:25

kali added 2 commits June 5, 2023 21:18

replace fmla by mul and add in FMA

9ee3fe0

wip for benches (noop micro o)

dd9f002

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vectorized activation vm #1008

vectorized activation vm #1008

kali commented Mar 21, 2023 •

edited

Loading

kali commented Mar 22, 2023 •

edited

Loading

vectorized activation vm #1008

Are you sure you want to change the base?

vectorized activation vm #1008

Conversation

kali commented Mar 21, 2023 • edited Loading

kali commented Mar 22, 2023 • edited Loading

Proposed virtual machine def

Moves, load consts to any reg, unary, binary, ternary on a,b,c only

Better spec:

What about other activation functions ?

kali commented Mar 21, 2023 •

edited

Loading

kali commented Mar 22, 2023 •

edited

Loading