docs/examples/example5.tex

\hypertarget{paper1}{%
\subsection{Paper:1}\label{paper1}}

\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
  Title: ``Good Robot!'': Efficient Reinforcement Learning for
  Multi-Step Visual Tasks with Sim to Real Transfer
\item
  Authors: Andrew Hundt, Benjamin Killeen, Nicholas Greene, Hongtao Wu,
  Heeyeon Kwon, Chris Paxton, and Gregory D. Hager
\item
  Affiliation: Andrew Hundt, Benjamin Killeen, Nicholas Greene, Hongtao
  Wu, Heeyeon Kwon, and Gregory D. Hager are with The Johns Hopkins
  University, Baltimore, MD 21218 USA; Chris Paxton is with NVIDIA,
  Seattle, WA, 98105 USA
\item
  Keywords: Reinforcement learning, sim to real transfer, multi-step
  tasks, computer vision, deep learning, grasping and manipulation.
\item
  Urls: https://ieeexplore.ieee.org/document/9130409 or Github:
  https://github.com/jhulcsr/good\_robot
\item
  Summary:
\end{enumerate}

\begin{itemize}
\item
  (1): The paper addresses the difficulty of learning long-horizon tasks
  in the multi-step robotic domain in which exploration may lead to dead
  ends, and task progress may be easily reversed. Learning such tasks is
  challenging due to the limited spatially and temporally information
  from sensory inputs in contrast to traditional motion planning, which
  assumes perfect information and known action models.
\item
  (2): The existing methods struggle with these challenges and rely on
  trial and error to explore and learn unsafe and uncertain
  environments, leading to inefficient and uncertain learning. In
  contrast, this paper proposes the Schedule for Positive Task (SPOT)
  framework, which explores within action-safety zones, learns about
  unsafe regions without exploring them, and prioritizes experiences
  that reverse earlier progress to learn efficiently.
\item
  (3): The authors demonstrate the effectiveness of the SPOT framework
  through experiments for a range of multi-step tasks, which include
  block stacking, creating rows of four cubes, and clearing toys
  arranged in adversarial patterns. The approach improves trial success
  rates, optimizes efficiency with respect to actions per trial by up to
  30\%, and takes just 1-20 k actions. The authors also demonstrate
  direct sim to real transfer where they can create real stacks and rows
  with an accuracy rate of 61\% and 59\%, respectively.
\item
  (4): The experiments carried out in the paper support the authors'
  goals in demonstrating an efficient reinforcement learning approach
  for multi-step robotic tasks, which incorporates common sense
  constraints and ensures successful direct sim to real transfers.
\end{itemize}

\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\setcounter{enumi}{6}
\tightlist
\item
  Methods:
\end{enumerate}

\begin{itemize}
\item
  (1): The paper proposes the Schedule for Positive Task (SPOT)
  framework to efficiently learn long-horizon tasks in the multi-step
  robotic domain, which explores within action-safety zones, learns
  about unsafe regions without exploring them, and prioritizes
  experiences that reverse earlier progress to learn efficiently.
\item
  (2): The framework includes reward shaping techniques such as sub-task
  weighting functions, Situation Removal concept, and SPOT-Q Learning
  with dynamic action spaces to reduce unproductive attempts and
  accelerate training.
\item
  (3): The authors demonstrate the effectiveness of the SPOT framework
  through experiments for a range of multi-step tasks, including block
  stacking, creating rows of four cubes, and clearing toys arranged in
  adversarial patterns. The approach improves trial success rates,
  optimizes efficiency with respect to actions per trial by up to 30\%,
  and takes just 1-20 k actions.
\item
  (4): The authors also demonstrate direct sim to real transfer where
  they can create real stacks and rows with an accuracy rate of 61\% and
  59\%, respectively. The experiments support the authors' goals in
  demonstrating an efficient reinforcement learning approach for
  multi-step robotic tasks, which incorporates common sense constraints
  and ensures successful direct sim to real transfers.
\end{itemize}

\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\setcounter{enumi}{7}
\tightlist
\item
  Conclusion:
\end{enumerate}

\begin{itemize}
\item
  (1): This work proposes an efficient reinforcement learning approach,
  called Schedule for Positive Task (SPOT) framework, for multi-step
  robotic tasks with direct sim to real transfer. It demonstrates the
  ability to learn long-horizon tasks with exploration within
  action-safety zones, learning about unsafe regions without exploring
  them, and prioritizing experiences that reverse earlier progress to
  learn efficiently. The authors present several experiments, such as
  block stacking, creating rows of four cubes, and clearing toys
  arranged in adversarial patterns, to showcase the effectiveness of the
  framework.
\item
  (2): Innovation point: The proposed SPOT framework is a new approach
  that goes beyond traditional reinforcement learning. It incorporates
  common sense constraints, such as sub-task weighting functions,
  Situation Removal concept, and SPOT-Q Learning with dynamic action
  spaces, to ensure successful direct sim to real transfers for
  long-horizon tasks. Performance: The framework improves trial success
  rates, optimizes efficiency with respect to actions per trial by up to
  30\%, and takes just 1-20 k actions. It offers direct sim to real
  transfer with an accuracy rate of 61\% and 59\%, respectively, for
  block stacking and creating rows of four cubes. Workload: The paper
  provides comprehensive experiments to support the claims made by the
  authors. However, the main limitation of the framework is that
  intermediate rewards can still be sparse and manual action space mask
  M is used. Future research may address this issue and extend the
  application of the method to more challenging tasks.
\end{itemize}