An implementation of the AdaOPS (Adaptive Online Packing-guided Search), which is an online POMDP Solver used to solve problems defined with the POMDPs.jl generative interface. The paper of AdaOPS was published on NeurIPS'2021.
If you are trying to use this package and require more documentation, please file an issue!
Press ]
key to enter the package management mode of Julia. Then, execute the following code.
pkg> add "POMDPs"
pkg> registry add "https://github.com/JuliaPOMDP/Registry.git"
pkg> add AdaOPS
using POMDPs, POMDPModels, POMDPSimulators, AdaOPS
pomdp = TigerPOMDP()
solver = AdaOPSSolver(bounds=IndependentBounds(-20.0, 0.0))
planner = solve(solver, pomdp)
for (s, a, o) in stepthrough(pomdp, planner, "s,a,o", max_steps=10)
println("State was $s,")
println("action $a was taken,")
println("and observation $o was received.\n")
end
For minimal examples of problem implementations, see this notebook and the POMDPs.jl generative docs.
Solver options can be found in the AdaOPSSolver
docstring and accessed using Julia's built in documentation system (or directly in the Solver source code). Each option has its own docstring and can be set with a keyword argument in the AdaOPSSolver
constructor.
A δ-packing of observation branches will be generated, i.e., the belief nodes with L1 distance less than delta are merged.
The core idea of the adaptive particle filter is that it can change the number of particles adaptively and use more particles to estimate the belief when needed.
grid
is used to split the state space into multidimensional bins, so that KLD-Sampling can estimate the particle numbers according to the number of bins occupied.
First, a function for converting a state to a multidimensional vector should be implemented, i.e., Base.convert(::Type{SVector{D, Float64}},::S)
, where D
is the dimension of the resulted vector.
Then, we define a StateGrid to discretize or split the state space.
A StateGrid is consist of a vector of cutpoints in each dimension. These cutpoints divide the whole space into small tiles. In each dimension, a number of intervals constitute the grid, and each of these intervals is left-closed and right-open with the endpoints be cutpoints with the exception of the left-most interval.
For example, a StateGrid can be defined as StateGrid([dim1_cutpoints], [dim2_cutpoints], [dim3_cutpoints])
.
All states lie in one tile will be taken as the same.
With the number of tiles (bins) occupied, we can estimate the number of particles using KLD-Sampling.
max_occupied_bins
is the maximum number of bins occupied by a belief. Normally, it is exactly the grid size. However, in some domains, such as Roomba, only states within the room is accessible, and the corresponding bins will never be occupied.
min_occupied_bins
is the minimum number of bins occupied by a belief. Normally, it default to 2. A belief occupying min_occupied_bins
tiles will be estimated with m_min
particles. Increasing min_occupied_bins
indicates that a belief need to occupy more bins so as to be estimated by the same amount of particles.
m_min
is the minimum number of particles used for approximating beliefs.
m_max
is the maximum number of particles used for approximating a belief. Normally, m_max
is set to be big enough so that KLD-Sampling determines the number of particles used. When the KLD-Sampling is disabled, i.e. grid=StateGrid()
, m_max
will be sampled during the resampling.
zeta
is the target error when estimating a belief. Spcifically, we use KLD Sampling to calculate the number of particles needed, where zeta
is the targe Kullback-Leibler divergence between the estimated belief and the true belief. In AdaOPS, zeta
is automatically adjusted according to the minimum number of bins occupied such that the minimum number of particles KLD-Sampling method suggests is exactly m_min
.
The bound passed into AdaOPSSolver
can be a function in the form of lower_bound, upper_bound = f(pomdp, wpf_belief)
, or any other objects for which a AdaOPS.bounds(obj::OBJECT, pomdp::POMDP, b::WPFBelief, max_depth::Int, bounds_warning::Bool)
function is implemented.
In most cases, the recommended way to specify bounds is with an IndependentBounds
object, i.e.
AdaOPSSolver(bounds=IndependentBounds(lower, upper))
where lower
and upper
are either a number, a function or some other objects (see below).
Often, the lower bound is calculated with a default policy, this can be accomplished using a PORollout
, FORollout
or RolloutEstimator
. For the in-depth details, please refer to BasicPOMCP. Note that when mixing the Rollout
structs from this package with those from BasicPOMCP
, you should prefix the struct name with module name.
Both the lower and upper bounds can be initialized with value estimations using a FOValue
or POValue
.
FOValue
support any offline MDP
Solver
or Policy
. POValue
support any offline POMDP
Solver
or Policy
.
If lower
or upper
is a function, it should handle two arguments. The first is the POMDP
object and the second is the WPFBelief
. To access the state particles in a WPFBelief
b
, use particles(b)
. To access the corresponding weights of particles in a WPFBelief
b
, use weights(b)
. All AbstractParticleBelief
APIs are supported for WPFBelief
. More details can be found in the solver source code.
If an object o
is passed in, AdaOPS.bound(o, pomdp::POMDP, b::WPFBelief, max_depth::Int)
will be called.
In most cases, the check_terminal
and consistency_fix_thresh
keyword arguments of IndependentBounds
should be used to add robustness (see the IndependentBounds
docstring for more info).
When using rollout-base bounds, you can specify max_depth
keyword argument to set the max depth of rollout.
For the BabyPOMDP
from POMDPModels
, bounds setup might look like this:
using POMDPModels
using POMDPPolicies
using BasicPOMCP
always_feed = FunctionPolicy(b->true)
lower = FORollout(always_feed)
function upper(pomdp::BabyPOMDP, b::WPFBelief)
if all(s==true for s in particles(b)) # all particles are hungry
return pomdp.r_hungry # the baby is hungry this time, but then becomes full magically and stays that way forever
else
return 0.0 # the baby magically stays full forever
end
end
solver = AdaOPSSolver(bounds=IndependentBounds(lower, upper))
D3Trees.jl can be used to visualize the search tree, for example
using POMDPs, POMDPModels, POMDPModelTools, D3Trees, AdaOPS
pomdp = TigerPOMDP()
solver = AdaOPSSolver(bounds=(-20.0, 0.0), tree_in_info=true)
planner = solve(solver, pomdp)
b0 = initialstate(pomdp)
a, info = action_info(planner, b0)
inchrome(D3Tree(info[:tree], init_expand=5))
will create an interactive tree.
Two utilities, namely info_analysis
and hist_analysis
, are provided for getting a sense of how the algorithm is working.
info_analysis
takes the infomation returned from action_info(planner, b0)
. It will first visualize the tree if the tree_in_info
option is turned on. Then it will show stats such as number nodes expanded, total explorations, average observation branches, and so on. hist_analysis
takes the hist
from HistoryRecorder
simulator. It will show similar stats as info_analysis
but in the form of figures. It should be noted that HistoryRecoder
will store the tree of each single step, which makes it memory-intensive. An example is shown as follows.
using POMDPs, AdaOPS, RockSample, POMDPSimulators, ParticleFilters, POMDPModelTools
m = RockSamplePOMDP(11, 11)
b0 = initialstate(m)
s0 = rand(b0)
bound = AdaOPS.IndependentBounds(FORollout(RSExitSolver()), FOValue(RSMDPSolver()), check_terminal=true, consistency_fix_thresh=1e-5)
solver = AdaOPSSolver(bounds=bound,
delta=0.3,
m_min=30,
m_max=200,
tree_in_info=true,
num_b=10_000
)
adaops = solve(solver, m)
a, info = action_info(adaops, b0)
info_analysis(info)
num_particles = 30000
@time hist = simulate(HistoryRecorder(max_steps=90), m, adaops, SIRParticleFilter(m, num_particles), b0, s0)
hist_analysis(hist)
@show undiscounted_reward(hist)
@inproceedings{wu2021adaptive,
title={Adaptive Online Packing-guided Search for POMDPs},
author={Wu, Chenyang and Yang, Guoyu and Zhang, Zongzhang and Yu, Yang and Li, Dong and Liu, Wulong and others},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021}
}