Implementes templates to use the Java-based Reienforcement Learning alogrithm's provied in the BURLAP libaray from Brown University.
- This project uses java and gradle. Make sure you have a recent version of Java JDK installed (recommend JDK 15 or higher)
- Install gradle
- Option 1: Gradle recommended gradle.org install instrucitons
- Option 2: use ASDF
- Run gradle build
./gradlew build
./gradlew helloGridWorld
Open the BURLAP GridWorld hellow world explorer, keys:- A-West, D-East, W-Up, S-Down
./gradlew blockDudeViewer
Run BURLAP's BlockDude, keys:- a - West, d - East, w - jump up
- s - pickup, x - putdown
./gradlew demoExperiment
Runs the complete demo experiments in RunExperiments.java
- Intellij - import new gradle project, select the root directory of this project
- Eclipse - (no tested)
A sample experiment has been provided in RunExperiments.java Edit this file to setup various experiment sizes, current examples:
- Setup Large & Small GridWorldExperiments
- Setup the Level1 & Level2 BlockDude experiments
Also, three MDP solver alogorithms are provided:
- Value Iteraion Experiments (use the VISettings class to set hyperparametrs)
- Policy Iteration Experiments (use the PISettings class to set hyperparametrs)
- Q-Learning Experimnets (use the QSettings class to set hyperparametrs)
For running your experiments, you can just execute the main() of the RunExperiments.java class from your IDE.
A CSV writer is attached to each experiment, the output filename of each experiment is controlled by a "shortName" which is configured as part of your experiment type settings, PISettings, VISettings or QSettings. This short name will provide a filename prefix for each of the experiment runs.
Example file output output/smprob-24105858/blockdude/
Metrics Captured: Each experiment type has the ability to capture metrics collected during the iteraions of the experiments here is sample of metrics collected:
- "iter" - iteration id
- "delta" - delta value found at each iteration
- "wallclock" - wallclock time spent in each iteration, milliseconds for VI/PI, but nanosecond for QLearning
- "evals" - the number of VI evals done within a single policy step for PI
- "numSteps" - for QLearning, number of steps during last episode of learning