Skip to content

bkkgbkjb/OPPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Beyond Reward: Offline Preference-guided Policy Optimization (OPPO)

This is source code for the paper, "Beyond Reward: Offline Preference-guided Policy Optimization" (OPPO)

Important Notes:

This repo is deleted and recreated on Sep. 9, 2024 to remove the large-size data files which keeps charging us for 5$/month.
The new repo points to bf2efeabacf0ad6da9512836a25ed20ae1cfe2cc as of old repo and there are no differences in scripts and methods.

If you have used the old version, you could either stick to it or use this new one.
If it's the first time for you to use this repo, you need to manually download the dataset needed following procedures as of Decision Transformer, Preference Transformer and Robomimic.

Main codes are in oppo folder It contains 2 parts:

scripted contains code to reproduce results using preferences generated by a "scripted teacher".

human contains code to train/eval OPPO using human-labeled perference, which is from Preference Transformer, please refer to their codebase for further details and consider cite their paper if needed

Citation

@misc{kang2023reward,
      title={Beyond Reward: Offline Preference-guided Policy Optimization}, 
      author={Yachen Kang and Diyuan Shi and Jinxin Liu and Li He and Donglin Wang},
      year={2023},
      eprint={2305.16217},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Acknowledgements

Our code is largely based on Decision Transformer

Human labels are obtained thanks to Preference Transformer

Our experiments, largely used D4RL dataset

Lift and Can environments are owing to Robomimic and Robosuite project

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published