-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexample5.tex
131 lines (125 loc) · 5.84 KB
/
example5.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
\hypertarget{paper1}{%
\subsection{Paper:1}\label{paper1}}
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
Title: ``Good Robot!'': Efficient Reinforcement Learning for
Multi-Step Visual Tasks with Sim to Real Transfer
\item
Authors: Andrew Hundt, Benjamin Killeen, Nicholas Greene, Hongtao Wu,
Heeyeon Kwon, Chris Paxton, and Gregory D. Hager
\item
Affiliation: Andrew Hundt, Benjamin Killeen, Nicholas Greene, Hongtao
Wu, Heeyeon Kwon, and Gregory D. Hager are with The Johns Hopkins
University, Baltimore, MD 21218 USA; Chris Paxton is with NVIDIA,
Seattle, WA, 98105 USA
\item
Keywords: Reinforcement learning, sim to real transfer, multi-step
tasks, computer vision, deep learning, grasping and manipulation.
\item
Urls: https://ieeexplore.ieee.org/document/9130409 or Github:
https://github.com/jhulcsr/good\_robot
\item
Summary:
\end{enumerate}
\begin{itemize}
\item
(1): The paper addresses the difficulty of learning long-horizon tasks
in the multi-step robotic domain in which exploration may lead to dead
ends, and task progress may be easily reversed. Learning such tasks is
challenging due to the limited spatially and temporally information
from sensory inputs in contrast to traditional motion planning, which
assumes perfect information and known action models.
\item
(2): The existing methods struggle with these challenges and rely on
trial and error to explore and learn unsafe and uncertain
environments, leading to inefficient and uncertain learning. In
contrast, this paper proposes the Schedule for Positive Task (SPOT)
framework, which explores within action-safety zones, learns about
unsafe regions without exploring them, and prioritizes experiences
that reverse earlier progress to learn efficiently.
\item
(3): The authors demonstrate the effectiveness of the SPOT framework
through experiments for a range of multi-step tasks, which include
block stacking, creating rows of four cubes, and clearing toys
arranged in adversarial patterns. The approach improves trial success
rates, optimizes efficiency with respect to actions per trial by up to
30\%, and takes just 1-20 k actions. The authors also demonstrate
direct sim to real transfer where they can create real stacks and rows
with an accuracy rate of 61\% and 59\%, respectively.
\item
(4): The experiments carried out in the paper support the authors'
goals in demonstrating an efficient reinforcement learning approach
for multi-step robotic tasks, which incorporates common sense
constraints and ensures successful direct sim to real transfers.
\end{itemize}
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\setcounter{enumi}{6}
\tightlist
\item
Methods:
\end{enumerate}
\begin{itemize}
\item
(1): The paper proposes the Schedule for Positive Task (SPOT)
framework to efficiently learn long-horizon tasks in the multi-step
robotic domain, which explores within action-safety zones, learns
about unsafe regions without exploring them, and prioritizes
experiences that reverse earlier progress to learn efficiently.
\item
(2): The framework includes reward shaping techniques such as sub-task
weighting functions, Situation Removal concept, and SPOT-Q Learning
with dynamic action spaces to reduce unproductive attempts and
accelerate training.
\item
(3): The authors demonstrate the effectiveness of the SPOT framework
through experiments for a range of multi-step tasks, including block
stacking, creating rows of four cubes, and clearing toys arranged in
adversarial patterns. The approach improves trial success rates,
optimizes efficiency with respect to actions per trial by up to 30\%,
and takes just 1-20 k actions.
\item
(4): The authors also demonstrate direct sim to real transfer where
they can create real stacks and rows with an accuracy rate of 61\% and
59\%, respectively. The experiments support the authors' goals in
demonstrating an efficient reinforcement learning approach for
multi-step robotic tasks, which incorporates common sense constraints
and ensures successful direct sim to real transfers.
\end{itemize}
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\setcounter{enumi}{7}
\tightlist
\item
Conclusion:
\end{enumerate}
\begin{itemize}
\item
(1): This work proposes an efficient reinforcement learning approach,
called Schedule for Positive Task (SPOT) framework, for multi-step
robotic tasks with direct sim to real transfer. It demonstrates the
ability to learn long-horizon tasks with exploration within
action-safety zones, learning about unsafe regions without exploring
them, and prioritizing experiences that reverse earlier progress to
learn efficiently. The authors present several experiments, such as
block stacking, creating rows of four cubes, and clearing toys
arranged in adversarial patterns, to showcase the effectiveness of the
framework.
\item
(2): Innovation point: The proposed SPOT framework is a new approach
that goes beyond traditional reinforcement learning. It incorporates
common sense constraints, such as sub-task weighting functions,
Situation Removal concept, and SPOT-Q Learning with dynamic action
spaces, to ensure successful direct sim to real transfers for
long-horizon tasks. Performance: The framework improves trial success
rates, optimizes efficiency with respect to actions per trial by up to
30\%, and takes just 1-20 k actions. It offers direct sim to real
transfer with an accuracy rate of 61\% and 59\%, respectively, for
block stacking and creating rows of four cubes. Workload: The paper
provides comprehensive experiments to support the claims made by the
authors. However, the main limitation of the framework is that
intermediate rewards can still be sparse and manual action space mask
M is used. Future research may address this issue and extend the
application of the method to more challenging tasks.
\end{itemize}