Example CEM implementation with ReLAx
This repository contains an implementation of cross entropy method (CEM) with ReLAx.
CEM actor was trained on HalfCheetah-v2 Mujoco Gym environment for 50k env-steps.
The graph of average return vs training step is shown below (batch_size=5000
):
The graph below shows actual rewards vs rewards fitted with environment model:
Resulting Policy: