Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add content to documentation #141

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,11 @@ Because JAX installation is different depending on your CUDA version, Haiku does

First, follow these instructions to install JAX with the relevant accelerator support.

```
pip install -r requirements.txt
```


## General Information
The project entrypoint is `pax/experiment.py`. The simplest command to run a game would be:

Expand Down
91 changes: 87 additions & 4 deletions docs/getting-started/agents.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,92 @@
# Agents

## Agent 1
## Overview

Pax provides a number of fixed opponents and learning agents to train and train against.

## Specifying an Agent
<!-- TODO: This isn't how Pax works atm. However, taken from Github README, so
assuming it is on the TODOs to add later. -->
Pax comes installed with an `Agent` class and several predefined agents. To specify an agent, import the `Agent` class and specify the agent parameters.

```
import jax.numpy as jnp
import Agent

args = {"hidden": 16, "observation_spec": 5}
rng = jax.random.PRNGKey(0)
bs = 1
init_hidden = jnp.zeros((bs, args.hidden))
obs = jnp.ones((bs, 5))

agent = Agent(args)
state, memory = agent.make_initial_state(rng, init_hidden)
action, state, mem = agent.policy(rng, obs, mem)

state, memory, stats = agent.update(
traj_batch, obs, state, mem
)

mem = agent.reset_memory(mem, False)
```

To run an experiment with a specific agent, use a pre-made `.yaml` file located in `conf/...` or create your own, and specify the agent. In the below example, `agent1` is a learning agent that learns via PPO and `agent2` is an agent that only chooses the Cooperate action.

```
# Agents
agent1: 'PPO'
agent2: 'Altruistic'

...
```

## List of Agents

```{note}
Fixed agents are game-specific, while learning agents like PPO can be used in both games.
```

### agent1, agent2

#### Fixed

Matrix games

| Agent | Description |
| ----------- | ----------- |
| **`Altruistic`** | Always chooses the Cooperate (C) action. |
| **`Defect`** | Always chooses the Defect (D) action. |
| **`GrimTrigger`** | Chooses the C action on the first turn and reciprocates with the C action until the opponent chooses D, where Grim switches to only choosing D.|
| **`HyperAltruistic`** | Infinite matrix game variant of `Altruistic`. Always chooses the Cooperate (C) action.|
| **`HyperDefect`** | Infinite matrix game variant of `Defect`. Always chooses the Defect (D) action.|
| **`HyperTFT`** | Infinite matrix game variant of `TitForTat`. Chooses the C action on the first turn and reciprocates the opponent's last action.|
| **`Random`** | Randomly chooses the C or D action. |
| **`TitForTat`** | Chooses the C action on the first turn and reciprocates the opponent's last action.|


Coin Game

| Agent | Description|
| ----------- | ----------- |
| **`EvilGreedy`** | Attempts to pick up the closest coin. If equidistant to two colored coins, then it chooses its opponents color coin.|
| **`GoodGreedy`** | Attempts to pick up the closest coin. If equidistant to two colored coins, then it chooses its own color coin. |
| **`RandomGreedy`** | Attempts to pick up the closest coin. If equidistant to two colored coins, then it randomly chooses a color coin. |
| **`Stay`** | Agent does not move.|

#### Learning

| Agent | Description |
| ----------- | ----------- |
| **`Naive`** | Simple learning agent that learns via REINFORCE. |
| **`NaiveEx`** | Infinite matrix game variant of `Naive`. Simple learning agent that learns via REINFORCE. |
| **`MFOS`** | Meta-learning algorithm for opponent shaping. |
| **`PPO`** | Learning agent parameterised by a multilayer perceptron that learns via PPO. |
| **`PPO_memory`** | Learning agent parameterised by a multilayer perceptron with a memory component that learns via PPO. |
| **`Tabular`** | Learning agent parameterised by a single layer perceptron that learns via PPO. |

```{note}
`PPO_memory` serves as the core learning algorithm for both **Good Shepherd (GS)** and **Context and History Aware Other Shaping (CHAOS)** when the training with meta-learning.
```

Lorem ipsum.

## Agent 2

Lorem ipsum.
93 changes: 89 additions & 4 deletions docs/getting-started/environments.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,94 @@
# Environments

## Environment 1
## Overview
Pax supports two environments for learning agents to train within: matrix games and grid-world games.

## Specifying the Environment

Pax environments are similar to gymnax. To specify an environment, import the environment and specify the environment parameters.

```
from pax.envs.iterated_matrix_game import (
IteratedMatrixGame,
EnvParams,
)

env = IteratedMatrixGame(num_inner_steps=5)
env_params = EnvParams(payoff_matrix=payoff)

# 0 = Defect, 1 = Cooperate
actions = (jnp.ones(()), jnp.ones(()))
obs, env_state = env.reset(rng, env_params)
done = False

while not done:
obs, env_state, rewards, done, info = env.step(
rng, env_state, actions, env_params
)
```

To specify the parameters for the environment:

```
...
# Environment
env_id: coin_game
env_type: meta
egocentric: True
env_discount: 0.96
payoff: [[1, 1, -2], [1, 1, -2]]
...
```

## List of Environment Parameters

### env_id
| Name | Description |
| :----------- | :----------- |
|`iterated_matrix_game`| Classic normal form game with a 2x2 payoff matrix repeatedly played over `n` steps. |
|`infinite_matrix_game` | Special case of the classic normal form game that calculates an exact value, simulating an infinite game.
|`coin_game` | Classic grid-world social dilemma environment. |

### env_type

| Name | Description |
| :----------- | :----------- |
|`sequential`| Classic normal form game with a 2x2 payoff matrix repeatedly played over `n` steps. |
|`meta`| Meta-learning regime, where an agent learns via meta-learning. |

### egocentric
| Name | Description |
| :----------- | :----------- |
|*bool*| If `True`, sets an agent in the Coin Game environment to an egocentric view, empirically found to be more appropriate for other shaping. Else, sets an agent in to a non-egocentric view, in line with the original version. |

### env_discount
<!-- TODO: Possibly deprecate. -->
| Name | Description |
| :----------- | :----------- |
|*Numeric*| Meta-learning discount factor. Between 0 and 1. |

### payoff
| Name | Description |
| :----------- | :----------- |
|*Array*| Custom payoff for game. |

Example:

```
# if playing Coin Game
payoff: [[1, 1, -2], [1, 1, -2]]
```

```
# if playing Matrix Games
payoff: [[-1, -1], [-3, 0], [0, -3], [-2, -2]]
```

```{note}
Docstrings are under constuction. Please check back later.
```



Lorem ipsum.

## Environment 2

Lorem ipsum.
81 changes: 81 additions & 0 deletions docs/getting-started/evaluation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Saving & Loading

Pax provides an easy way to save and load your models.

## Overview

Saving and loading allows users to save or load models locally or from Weight and Biases. Users can configure the experiment `.yaml` file to set up the save and load file path, either locally or online.

## List of Saving Parameters

### save
| Name | Description |
| :----------- | :----------- |
|*bool* | If `True`, the model is saved to the filepath specified by `save_dir`. |


### save_dir
| Name | Description |
| :----------- | :----------- |
|*String* | Filepath used to save a model. |

### save_interval

| Name | Description |
| :----------- | :----------- |
|*Int* | Number of iterations between saving a model. |

Example
```
# config.yaml
save: True
save_interval: 10
save_dir: "./exp/${wandb.group}/${wandb.name}"
```

## List of Loading Parameters

### model_path
| Name | Description |
| :----------- | :----------- |
|*String* | Filepath to load the model. |

### run_path
| Name | Description |
| :----------- | :----------- |
|*String* | If using Weights and Biases (i.e. `wandb.log=True`), this is the run path of the model used to load the model. |

Example
```
# config.yaml
run_path: ucl-dark/cg/3mpgbfm2
model_path: exp/coin_game-EARL-PPO_memory-vs-Random/run-seed-0/2022-09-08_20.41.03.643377/generation_30
```

### wandb

```{note}
The following parameters are used for Weights and Biases specific features.
```

```
wandb:
entity: "ucl-dark"
project: cg
group: 'EARL-${agent1}-vs-${agent2}'
name: run-seed-${seed}
log: False
```
| Name | Description |
| :----------- | :----------- |
|`entity` | Weights and Biases entity. |
|`project` | Weights and Biases project name. |
|`group` | Weights and Biases group name. |
|`name` | Weights and Biases run name. |
|`log` | Weights and Biases run name. |






6 changes: 1 addition & 5 deletions docs/getting-started/installation.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
# Installation

Pax is written in pure Python, but depends on C++ code via JAX.

Because JAX installation is different depending on your CUDA version, Haiku does not list JAX as a dependency in requirements.txt.

First, follow these instructions to install JAX with the relevant accelerator support.
PAX will soon be available to install via the [Python Package Index](https://github.com/akbir/pax). For full installation instructions, please refer to the [Install Guide](https://github.com/akbir/pax) in the project README.
94 changes: 90 additions & 4 deletions docs/getting-started/runners.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,95 @@
# Runner

## Runner 1
## Overview

Lorem ipsum.
Pax provides a number of experiment runners useful for different use cases of training and evaluating reinforcement learning agents.

## Runner 2
## Specifying a Runner

Lorem ipsum.
Pax centers around its runners, pieces of custom experiment logic that leverage the speed of JAX. After specifying the environment and agents, a runner carries out the experiment. The code below shows a portion of a runner that carries out a rollout and updates the agent:

```
def _rollout(carry, unused):
"""Runner for inner episode"""
(
rngs,
obs,
a_state,
a_mem,
env_state,
env_params,
) = carry

# unpack rngs
rngs = self.split(rngs, 4)
action, a_state, new_a_mem = agent1.batch_policy(
a_state,
obs[0],
a_mem,
)

next_obs, env_state, rewards, done, info = env.step(
rngs,
env_state,
(action, action),
env_params,
)

traj = Sample(
obs1,
action,
rewards[0],
new_a1_mem.extras["log_probs"],
new_a1_mem.extras["values"],
done,
a1_mem.hidden,
)

return (
rngs,
next_obs,
a1_state,
new_a1_mem,
env_state,
env_params,
), (
traj1,
traj2,
)


agent = Agent(args)
state, memory = agent.make_initial_state(rng, init_hidden)

for _ in range(num_updates):
final_timestep, batch_trajectory = jax.lax.scan(
_rollout,
((obs, env_state, rng), rollout_length),
10,
)

_, obs, rewards, a1_state, a1_mem, _, _ = final_timestep

state, memory, stats = agent.update(
batch_trajectory, obs[0], state, memory
)
```

To specify the runner in an experiment, use a pre-made `.yaml` file located in `conf/...` or create your own, and specify the runner with `runner`. In the below example, the `evo` flag and the `EvoRunner` used.

```
...
# Runner
runner: evo
...
```

## List of Runners

### runner
| Runner | Description|
| ----------- | ----------- |
| **`eval`** | Evaluation runner, where a single, pre-trained agent is evaluated. |
| **`evo`** | Evolution runner, where two independent agents are trained via Evolutionary Strategies (ES). |
| **`rl`** | Multi-agent runner, where two independent agents are trained via reinforcement learning. |
| **`sarl`** | Single-agent runner, where a single agent is trained via reinforcement learning. |
Loading