You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- try to solve problem first in $IW(1)$, if not solved, then $IW(2)$, ... - novetly: smallest subset of atoms (which is first showing up) size of the new state
MDP
Or
fully observable, probabilistic state models
- value iteration:(update value via last iteration value) - policy iteration:(update policy via existing policy)
MCTs
UCT
Q-Learning
off-policy optimistic unsafe or risky
SARSA
safe
n-step SARSA
Nash Equation
Mix strategies: Indifferency
- think of $A$ and $B$, the indifferency for $A$ is that the probability $X$ of selecting action $a$ that makes $B$ have the same reward.