index.html

<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
  <title>@brains/rl: Gridworld with Dynamic Programming Demo</title>
  <meta name="description" content="">
  <meta name="author" content="">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <script src="index.js"></script>
 </head>
 <body>

  <a href="https://github.com/karpathy/reinforcejs"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://camo.githubusercontent.com/38ef81f8aca64bb9a64448d0d70f1308ef5341ab/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f72696768745f6461726b626c75655f3132313632312e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png"></a>

   <div id="wrap">
   
   <div id="mynav" style="border-bottom:1px solid #999; padding-bottom: 10px; margin-bottom:50px;">
    <div>
      <img src="loop.svg" style="width:50px;height:50px;float:left;">
      <h1 style="font-size:50px;">@brains/rl</h1>
    </div>
    <nav class="nav nav-pills nav-fill">
      <a class="nav-item nav-link active" href="index.html">About</a>
      <a class="nav-item nav-link" href="gridworld_dp/index.html">GridWorld: DP</a>
      <a class="nav-item nav-link" href="gridworld_td/index.html">GridWorld: TD</a>
      <a class="nav-item nav-link" href="puckworld/index.html">PuckWorld: DQN</a>
      <a class="nav-item nav-link" href="waterworld/index.html">WaterWorld: DQN</a>
    </nav>
   </div>

   <div id="exp" class="md">
# About
**NOTE: A typescript fork of the world demonstrations from https://github.com/karpathy/reinforcejs that uses https://github.com/brainjs/reinforcejs.
An effort to show off the original demonstrations, and a place for more.**

**@brainjs/rl** is a Reinforcement Learning library that implements several common RL algorithms supported with fun web demos, and is currently maintained by [@karpathy](https://twitter.com/karpathy). In particular, the library currently includes:

### Dynamic Programming

For solving finite (and not too large), deterministic MDPs. The solver uses standard tabular methods will no bells and whistles, and the environment must provide the dynamics.

<div style="text-align:justify; margin: 20px; float:right; max-width:300px;">
<img src="img/dpsolved.jpeg" style="max-width:300px;"><br>
Right: A simple Gridworld solved with a Dynamic Programming. Very exciting. Head over to the <a href="gridworld_dp.html">GridWorld: DP demo</a> to play with the GridWorld environment and policy iteration.
</div>

### Tabular Temporal Difference Learning

Both SARSA and Q-Learning are included. The agent still maintains tabular value functions but does not require an environment model and learns from experience. Support for many bells and whistles is also included such as Eligibility Traces and Planning (with priority sweeps).

### Deep Q Learning

Reimplementation of [Mnih et al.](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) Atari Game Playing model. The approach models the action value function Q(s,a) with a neural network and hence allows continuous input spaces. However, with a fixed number of discrete actions. The implementation includes most of the bells and whistles (e.g. experience replay, TD error clamping for robustness).

### Policy Gradients

The implementation includes a stochastic policy gradient Agent that uses REINFORCE and LSTMs that learn both the actor policy and the value function baseline, and also an implementation of recent Deterministic Policy Gradients by [Silver et al](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Publications_files/deterministic-policy-gradients.pdf). To make a lot of this happen (e.g. LSTMs in particular), the library includes a fork of my previous project [recurrentjs](https://github.com/karpathy/recurrentjs), which allows one to set up graphs of computations over matrices and perform automatic backprop. 

I do not include the demo for policy gradient methods because the current implementations are unfortunately finicky and unstable (both stochastic and deterministic). I still include the code in the library in case someone wants to poke around. I suspect that either there are bugs (It's proving difficult to know for sure), or I'm missing some tips/tricks needed to get them to work reliably.

# Example Library Usage

Including the library (currently there is no nodejs support out of the box):

```html
<script src="@brainjs/rl.js"></script>
```

For most applications (e.g. simple games), the DQN algorithm is a safe bet to use. If your project has a finite state space that is not too large, the DP or tabular TD methods are more appropriate. As an example, the DQN Agent satisfies a very simple API:

```typescript
// import DQNAgent
import { DQNAgent } from "rl";

// create MyAgent class
class MyAgent extends DQNAgent {
  get inputSize() { return 8; }
  get outputSize() { return 4; }
}

// instantiate the agent
const opt = { alpha: 0.01 } // see full options on DQN page
const agent = new MyAgent(opt);

setInterval(() => { // start the learning loop
  const action = agent.act(s); // s is an array of length 8
  //... execute action in environment and get the reward
  agent.learn(reward); // the agent improves its Q,policy,model, etc. reward is a float
}, 0);
```

In other words, you pass the agent some vector and it gives you an action. Then you reward or punish its behavior with the `reward` signal. The agent will over time tune its parameters to maximize the rewards it obtains.

The full source code is on <a href="https://github.com/karpathy/reinforcejs">Github</a> under the MIT license.
   </div>

   </div>
 </body>
</html>