Multi-Armed Bandit Algorithm

Bandit algorithm balances exploration and exploitation, and is part of *reinforcement learning.

Reinforcement Learning

In Reinforcement Learning, the system makes a decision based on current situation. Unlike supervised learning, there is no training data available providing the correct decision to the system. We just get a reward back from the environment, indicating the quality of the decison that was made.

Reinforcement learning problems involve the following artiofacts.

State
Action or Decision
Reward

Implementations

Random greedy
Upper confidence bound one
Upper confidence bound two
Softmax
Interval estimate
Thompson sampling
Reward comparison
Action pursuit
Exponential weight

TODO

implement test for the different algorithms
plots the different algorithm
pros/cons for each of them
use cases and examples

References

https://www.quora.com/What-is-Thompson-sampling-in-laymans-terms
https://www.linkedin.com/pulse/dynamic-price-optimization-multi-arm-bandit-pranab-ghosh
https://pkghosh.wordpress.com/2013/08/25/bandits-know-the-best-product-price/
https://github.com/pranab/avenir
https://www.gsb.stanford.edu/sites/gsb/files/mkt_10_17_misra.pdf
http://alekhagarwal.net/bandits_and_rl/intro.pdf

Usage

package main

import (
	"log"
	"math/rand"
	"sync"
	"time"

	bandit "github.com/alextanhongpin/go-bandit"
)

func init() {
	rand.Seed(time.Now().UnixNano())
}

func main() {
	b, err := bandit.NewEpsilonGreedy(0.1, nil, nil)
	if err != nil {
		log.Println(err)
	}

	b.Init(5)

  N := 1000

	var wg sync.WaitGroup
	wg.Add(N)
	for i := 0; i < N; i++ {
		go func() {
			defer wg.Done()
			chosenArm := b.SelectArm(rand.Float64())
			reward := float64(rand.Intn(2))
			b.Update(chosenArm, reward)
		}()
	}

	wg.Wait()
	log.Printf("bandit: %+v", b)
	log.Println("done")
}

Test for data race:

$ go run -race main.go

Output:

2018/06/04 23:43:27 bandit: &{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0} Epsilon:0.1 Counts:[233 220 512 19 16] Rewards:[0.4592274678111587 0.48181818181818176 0.5097656249999998 0.3684210526315789 0.25]}
2018/06/04 23:43:27 done

Stats

$ brew install tokei
$ tokei

Output:

-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 Go                     12         1120          879           74          167
 Markdown                1          103          103            0            0
 Python                  1           18           11            2            5
-------------------------------------------------------------------------------
 Total                  15         1241          993           76          172
-------------------------------------------------------------------------------

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multi-Armed Bandit Algorithm

Reinforcement Learning

Implementations

TODO

References

Usage

Stats

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multi-Armed Bandit Algorithm

Reinforcement Learning

Implementations

TODO

References

Usage

Stats