Proximal Policy Optimization (PPO) for Rocket Landing

The goal is to train a reinforcement learning agent to control a rocket to either hover or land safely using the PPO algorithm. The environment simulates physics for the rocket, and the agent learns to make decisions based on the state observations to achieve the task.

Rocket-Landing.mp4

Actions

The rocket can take different actions, each defined by a thrust force and nozzle angular velocity. The available actions are:

Thrust Levels: Controls the rocket’s thrust, with options:
- 0.2 × g
- 1.0 × g
- 2.0 × g
Nozzle Angular Velocities: Controls the rotation of the rocket nozzle, with options:
- 0 (No rotation)
- +30°/s
- -30°/s

These combinations result in a set of 9 actions:

Action	Thrust	Nozzle Angular Velocity
0	0.2 g	0
1	0.2 g	+30°/s
2	0.2 g	-30°/s
3	1.0 g	0
4	1.0 g	+30°/s
5	1.0 g	-30°/s
6	2.0 g	0
7	2.0 g	+30°/s
8	2.0 g	-30°/s

States

The state of the rocket environment is represented by an 8-dimensional vector, capturing essential details for controlling the rocket. Each component of the state vector is normalized:

x: Horizontal position (m)
y: Vertical position (m)
vx: Horizontal velocity (m/s)
vy: Vertical velocity (m/s)
θ (theta): Rocket’s angle relative to the vertical (radians)
vθ (vtheta): Angular velocity of the rocket (radians/s)
t: Simulation time (steps)
φ (phi): Nozzle angle (radians)

These states provide the necessary information for the agent to understand the rocket's position, orientation, and dynamics, which are essential for executing successful hovering or landing maneuvers.

Features

Custom Rocket Environment: Simulates rocket physics for hovering and landing tasks.
PPO Algorithm Implementation: Utilizes both actor and critic neural networks for policy optimization.
Continuous and Discrete Actions: Supports environments with continuous or discrete action spaces.
Real-time Plotting: Visualizes training progress with moving average and variability shading.
Logging and Checkpointing: Logs training metrics and saves model checkpoints for later use.
Testing Script: Includes a test.py script for evaluating the trained agent.

Requirements

Python 3.6 or higher
PyTorch
NumPy
Matplotlib

Installation

Clone the Repository

git clone https://github.com/taherfattahi/ppo-rocket-landing.git
cd ppo-rocket-landing

Create a Virtual Environment (Optional)

python -m venv venv
source venv/bin/activate  # On Windows use venv\Scripts\activate

Install Dependencies
```
pip install torch numpy matplotlib
```
Ensure CUDA Availability (Optional)

If you have a CUDA-compatible GPU and want to utilize it:
- Install the appropriate CUDA toolkit version compatible with your PyTorch installation.
- Verify CUDA availability in PyTorch:
```
import torch
torch.cuda.is_available()
```

Usage

Training

Run Training
```
python train.py
```
Monitor Training
- Training progress will be displayed in the console.
- A real-time plot will show the moving average reward and variability.
- Logs will be saved in the PPO_logs directory.
- Model checkpoints will be saved in the PPO_preTrained directory.
Adjust Hyperparameters
- Modify hyperparameters in train.py to experiment with different settings.
- Key hyperparameters include learning rates, discount factor, and PPO-specific parameters.

Testing

Ensure a Trained Model is Available

Make sure you have a trained model saved in the PPO_preTrained/RocketLanding/ directory.
Run Testing
```
python test.py
```
Observe Agent Behavior
- The agent will interact with the environment using the trained policy.
- The environment will render the rocket's behavior in real-time.
- Testing results will be printed to the console.

Project Structure

PPO.py: Contains the implementation of the PPO algorithm.
train.py: Script to train the PPO agent in the Rocket environment.
test.py: Script to test the trained PPO agent.
rocket.py: Should contain the custom Rocket environment class.
utils.py: May contain utility functions for the environment or agent.
PPO_logs/: Directory where training logs are stored.
PPO_preTrained/: Directory where model checkpoints are saved.

Mathematical Background