A Closer Look at Offline RL Agents

In the experiment, we re-implement following the baseline agents in JAX:

TD3BC
CQL
COMBO
IQL

Evaluation Experiments

In this work, we take a closer look at the behaviors of current SOTA offline RL agents, especially for the learned representations, value functions and policies. Surprisingly, we find that the most performant offline RL agent sometimes has relatively low-quality representations and inaccurate value functions. In specific, a performant offline RL policy is usually able to select better sub-optimal actions, while avoiding bad ones.

Representation Experiments

In this experiment, we run linear representation probing experiments and evaluate some recently proposed representation metrics.

python exp_representation/run_exp.py

Value Funtion Experiments

In this experiment, we evaluate the ability of the learned $Q$-function to rank different actions.

python exp_value_function/value_ranking_exp.py

Policy Experiments

In this experiment, we directly evaluate the learned policy of each agent.

# policy ranking experiment
python exp_policy/policy_ranking_exp.py

# ood action experiment
python exp_policy/ood_action_pi_exp.py

Relaxed In-smaple Q-Learning

We present a variant of IQL, which relaxes the in-sample constriant in the policy improvement step.

./relaxed_iql/run_exp.sh

Uncertainty-based Sample Selection

We further investigate the use of a learned dynamics model for model-free offline RL agents.

# run miql
python miql/main.py

# run mtd3bc
python mtd3bc/main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Closer Look at Offline RL Agents

Evaluation Experiments

Representation Experiments

Value Funtion Experiments

Policy Experiments

Relaxed In-smaple Q-Learning

Uncertainty-based Sample Selection

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
baselines		baselines
exp_policy		exp_policy
exp_representation		exp_representation
exp_value_function		exp_value_function
miql		miql
mtd3bc		mtd3bc
relaxed_iql		relaxed_iql
.gitignore		.gitignore
README.md		README.md

fuyw/RIQL

Folders and files

Latest commit

History

Repository files navigation

A Closer Look at Offline RL Agents

Evaluation Experiments

Representation Experiments

Value Funtion Experiments

Policy Experiments

Relaxed In-smaple Q-Learning

Uncertainty-based Sample Selection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages