- Architecture
- denote a glimpse at each time step
$t$ by$G_t$ .$G_1$ is a proposal bounding box$(p_x, p_y, p_w, p_h)$ . $$ G_t=(\delta_x,\delta_y,\delta_w,\delta_h,)=(\frac{g_x-p_x}{p_w},\frac{g_y-p_y}{p_y},\log\frac{g_w}{p_w},\log\frac{g_h}{p_h},) $$ where$(g_x, g_y, g_w, g_h)$ is the center coordinate, width, height - ROI pooling in
$G_t$ : The ROI pooling operation divides a given ROI into a predefined grid of sub-windows and then max-pools the feature values in each sub-window. - The pooled features are fed into a stack recurrent neural network of two layers
- element-wise max operation
- softmax classification layer and bounding box regression layer
- denote a glimpse at each time step
- Reinforcement learning: whether the new glimpse location is useful for the task of object detection or not $$ r_t=\left{ \begin{array}l P(c^)\times IoU(B_{c^},B^_{c^}) \quad (t=T) \ 0 \quad (otherwise) \end{array} \right. $$
$P(c^*)$ 是正确类别的概率,IoU(intersection over union)是box的重叠程度
$\xi$ 是一个回合episode,包含$T$ steps ,$R$是回合奖励
$J$ 是优化木匾函数
梯度[22]
[22] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine Learning, vol. 8, no. 3-4, pp. 229–256, 1992. [25] P. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
Back Propagation Through Time (BPTT)