This release consists of codes for two projects:
- The MAXQ-based hierarchical online planning algorithm: MAXQ-OP
- The HAMQ-based hierarchical reinforcement learning algorithm: HAMQ-INT
Taxi domain:
Overall results:
Averaged over 200 runs.
The idea is to identify and take advantage of internal transitions within a HAM, which is represented as a partial program, for efficient hierarchical reinforcement learning. Details can be found in:
- Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions, Aijun Bai, and Stuart Russell, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, August 19 - 25, 2017. [pdf][bib]
This is the code release of MAXQ-OP algorithm on the Taxi domain as described in papers:
- Online planning for large Markov decision processes with hierarchical decomposition, Aijun Bai, Feng Wu, and Xiaoping Chen, ACM Transactions on Intelligent Systems and Technology (ACM TIST),6(4):45:1-45:28, July 2015.
- Online Planning for Large MDPs with MAXQ Decomposition (Extended Abstract), Aijun Bai, Feng Wu, and Xiaoping Chen, Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Valencia, Spain, June 2012.
maxqop.{h, cpp}
: the MAXQ-OP algorithmHierarchicalFSMAgent.{h, cpp}
: the HAMQ-INT algorithmMaxQ0Agent.{h, cpp}
: the MAXQ-0 algorithmMaxQQAgent.{h, cpp}
: the MAXQ-Q algorithmagent.h
: abstractAgent
classstate.{h, cpp}
: abstractState
classpolicy.{h, cpp}
:Policy
classestaxi.{h, cpp}
: the Taxi domainsystem.{h, cpp}
: agent-environment driver codetable.h
: tabular V/Q functionsdot_graph.{h, cpp}
: tools to generate graphvizdot
files
- libboost-dev
- libboost-program-options-dev
- gnuplot
- MAXQ-OP on RoboCup Soccer Simulation 2D Challenge: https://github.com/wrighteagle2d/wrighteaglebase
- Concurrent HAMQ on RoboCup Soccer Simulation 2D Keepaway Challenge: https://github.com/aijunbai/keepaway