Skip to content

Latest commit

 

History

History
50 lines (24 loc) · 4.3 KB

Algorithms.md

File metadata and controls

50 lines (24 loc) · 4.3 KB

算法

Short Name Comment 1 Comment 2
STRIPTs image-20181105092440322
image-20181105092457653
BrFS(BFS) Breadth-first-search - Completeness
- Optimality(if costs are uniform)
DFS Deep-first-search - Not completeness
- Not optimality
ID Iterative Depening - Completeness
- Optimality
GBFS Greedy Best-First Search
BFWS Best-First Width Search
$h^*$ - Optimal one (theoretical)
$h^+$ image-20181101144542289 Safe, Admissible (drop deletion)
$h^{add}$ image-20181101145014944 Safe, not admissible (based on $h^+$)
$h^{max}$ image-20181101145043606 Safe, admissible (based on $h^+$)
$IW(k)$ - if $novetly(s) > k$, then prune - try to solve problem first in $IW(1)$, if not solved, then $IW(2)$, ...
- novetly: smallest subset of atoms (which is first showing up) size of the new state
MDP image-20181102101208526
Or
image-20181102101425470
image-20181102101433642
fully observable, probabilistic state models

image-20181101100803418
- value iteration:(update value via last iteration value)
image-20181102102143296
- policy iteration:(update policy via existing policy)
image-20181102111405872
MCTs
UCT image-20181102151047562
Q-Learning image-20181104111846764
off-policy
optimistic
unsafe
or risky
SARSA image-20181104112910413
safe
n-step SARSA image-20181104114018787
Nash Equation
Mix strategies:
Indifferency
image-20181104212731163 - think of $A$ and $B$, the indifferency for $A$ is that the probability $X$ of selecting action $a$ that makes $B$ have the same reward.

Reinforcement Learning Cons

image-20181104113339990