Research Papers List of Jiawei Gao
I will try to summarize each paper in one sentence. Important papers will be marked with ⭐. If you find something interesting or want to discuss it with me, feel free to contact me via Email or Github issues.
Inspired by my friend Ze's Reading List.
- Dynamics as Prompts: In-Context Learning for Sim-to-Real System Identifications. Website. Collect data in simulation, and use history data to predict the environment parameters in real time.
- IROS 2024, Physically Consistent Online Inertial Adaptation for Humanoid Loco-manipulation. Paper. Use EKF to estimate the inertial parameters of the payload on robot's hand. Integrate with a model-based controller on a humanoid.
- CoRL 2024, TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction. Website, Code. Collect online human-in-the-loop teleoperation correction data, and learned an residual policy on top of base policy trained in simulation.
- arXiv 2024.10, Physics-Informed Learning for the Friction Modeling of High-Ratio Harmonic Drives. Arxiv. Estimate friction and compensate for huamnoid robots.
- arXiv 2024.10, Language Agents Meet Causality -- Bridging LLMs and Causal World Models. Website. Using causal representation learning to learn casual variables, and then let LLM agent to plan.
- CoRL 2022, DayDreamer: World Models for Physical Robot Learning. Website. Imagined rollouts in latent space.
- arXiv 2024.10, Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust. Website. Knowing which part of image is more sensitive to the task.
- CoRL 2023, Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners. Website. Draw inspiration from conformal prediction theory. Letting LLM output multipule plans and also the likelihood of each choices, then calibrate by using datasets or executing these plans.
- arXiv 2024.03, Explore until Confident: Efficient Exploration for Embodied Question Answering. Website. Leveraging VLM to output multiple possible plans, let the robot to explore until there is only one output from VLM.
- arXiv 2024.10, HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots. Website. First train a privileged oracle policy, then distill in to policies in different command modes by masking.
- arXiv 2024.10, Whole-Body Dynamic Throwing with Legged Manipulators. Paper. Whole-body object throwing by training with RL, the key is reward shaping.
- arXiv 2024.09, iWalker: Imperative Visual Planning for Walking Humanoid Robot. Paper. Depth perception to planning, and then use model-based control.
- arXiv 2024.07, Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots. Website. Mask commands so that the robots can track different command modalities.
- arXiv 2024.06, PlaMo: Plan and Move in Rich 3D Physical Environments. Paper. Integrate path planner and motion controller for humanoid characters to navigate in 3d scenes.
- arXiv 2024.05, Hierarchical World Models as Visual Whole-Body Humanoid Controllers. Website, Code. First train a tracking agent that takes abstact command as input, then train hierarchical RL for downstream tasks.
- NeurIPS 2024, Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions. Website, Code. Multi human interaction dataset.
- ⭐ SIGGRAPH Asia 2024, MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting. Website, Code. First train a priviledged motion imitation policy, then distill into different command modes. Policy architecture is CVAE. Encoder can provide an offset on top of leared priors (learned by a transformer), and only prior and decoder are used during test time.
- SIGGRAPH 2024, Taming Diffusion Probabilistic Models for Character Control. Website. A transformer-based conditional autoregressive motion diffusion model for character control.
- arXiv 2023.12, PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction. Website, Code. Utilize contact-graph rewards for better tracking.
- arXiv 2023.12, MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete Representations. Website.
- ICLR 2019, Neural probabilistic motor primitives for humanoid control. Paper. Encode motion dataset into a latent space, and use decoder as a policy.
- MIG 2018, Physics-based motion capture imitation with deep reinforcement learning. Paper. Training RL to control humanoid characters. Maybe the start point of isaacgym.
- arXiv 2024.10, FRASA: An End-to-End Reinforcement Learning Agent for Fall Recovery and Stand Up of Humanoid Robots. Paper. Training a DRL policy for fall recovery on kid size humanoid robots.
- arXiv 2024.10, Learning Humanoid Locomotion over Challenging Terrain. Arxiv. First pretrain using transformer to do next-sequnece prediction, then fine-tune with RL.
- arXiv 2024.9 Whole-body end-effector pose tracking. Arxiv. Training pose tracking task with command sampling.
- arXiv 2024.09, Real-Time Whole-Body Control of Legged Robots with Model-Predictive Path Integral Control. Website. MPPI on quadrupeds.
- Humanoids 2024, Know your limits! Optimize the behavior of bipedal robots through self-awareness. Website. Generate many reference trajectories given textual commands, and use a self-awareness module to rank them.
- arXiv 2024.09, PIP-Loco: A Proprioceptive Infinite Horizon Planning Framework for Quadrupedal Robot Locomotion. Paper, Code. Training future horizon prediction in simulation, and use MPPI when deployment.
- arXiv 2024.09, Learning Skateboarding for Humanoid Robots through Massively Parallel Reinforcement Learning. Paper.
- arXiv 2024.09, Learning to Open and Traverse Doors with a Legged Manipulator. Paper.
- arXiv 2024.09, One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion. Paper. Learn an abstract locomotion controller, encoder-decoder architecture.
- SCA 2024, PartwiseMPC: Interactive Control of Contact-Guided Motions. Website. Utilize contact keyframes as task description and partwise MPC.
- arXiv 2024.04, Learning H-Infinity Locomotion Control. Website. Adding a learnable disturber network to achieve the robustness of the policy.
- arXiv 2024.08, PIE: Parkour with Implicit-Explicit Learning Framework for Legged Robots. Paper. Use A2C framework, implicit state estimation by VAE, explicit state estimation by regression.
- arXiv 2024.07, Berkeley Humanoid: A Research Platform for Learning-based Control. Website. A low-cost, DIY-style, mid-scale humanoid robot.
- arXiv 2024.07, Wheeled Humanoid Bilateral Teleoperation with Position-Force Control Modes for Dynamic Loco-Manipulation. Paper. Designed a system to retarget human loco-manipulation into a small wheeled robot with hands.
- ⭐ arXiv 2024.7, UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers. Website, Code.Use end-effector trajectories in the task frame as interface between manipulation policy and wholebody controller.
- arXiv 2024.06, SLR: Learning Quadruped Locomotion without Privileged Information. Website. Learning some representation and state transition model. Quite confused because of the poor writing.
- arXiv 2024.05, Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot. Paper. Sim-to-real synchronization based on frequency-domain analysis.
- arXiv 2024.05, Combining Teacher-Student with Representation Learning: A Concurrent Teacher-Student Reinforcement Learning Paradigm for Legged Locomotion. Paper.
- arXiv 2024.04, DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets. Paper. Use DDPM to learn from offline dataset, introducing delayed input and action predictions tricks for real-time deployment.
- arXiv 2024.03, VBC: Visual Whole-Body Control for Legged Loco-Manipulation. Website, Code. Decouple into low-level and high-level policy, end-effector positions are the interface.
- arXiv 2024.03, RoboDuet: A Framework Affording Mobile-Manipulation and Cross-Embodiment. Website. First train locomotion when arm is fixed, and then train loco policy and arm policy jointly.
- arXiv 2024.01, Adaptive Mobile Manipulation for Articulated Objects In the Open World. Website. Learning at test time. Use CLIP to generate reward: compute the similarity score of the observed image and 2 prompts, "door that is closed" and "door that is open"
- ⭐ RSS 2024, RL2AC: Reinforcement Learning-based Rapid Online Adaptive Control for Legged Robot Robust Locomotion. Paper. Adding feed forward into PD controller.
- RSS 2024, (Best Paper Award Finalist), Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning. Paper.
- RSS 2024, Rethinking Robustness Assessment: Adversarial Attacks on Learning-based Quadrupedal Locomotion Controllers. Website.
- L4DC 2024, Learning and Deploying Robust Locomotion Policies with Minimal Dynamics Randomization. arXiv. RFI (Random Force Injection)
- TRO 2024, Adaptive Force-Based Control of Dynamic Legged Locomotion over Uneven Terrain. Paper. Incorporating adaptive control into a force-based control system.
- ⭐ CoRL 2022, Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion. Website, Code. Advantage mixing and Regularized Online Adaptation.
- ICRA 2024, Learning Force Control for Legged Manipulation. Website, Thesis Paper. End effector force tracking.
- ⭐ TRO 2024, Not Only Rewards but Also Constraints: Applications on Legged Robot Locomotion. Paper. Utilize constraints instead of reward function. Use IPO to solve the constrained RL problem.
- arXiv 2023.04, Torque-based Deep Reinforcement Learning for Task-and-Robot Agnostic Learning on Bipedal Robots Using Sim-to-Real Transfer. Paper. The policy outputs torque directly at 250 HZ.
- arXiv 2023.03, Learning Bipedal Walking for Humanoids with Current Feedback. Paper. Simulation poor torque tracking in simulation, measure and track torque in real robots.
- CoRL 2023, Learning to See Physical Properties with Active Sensing Motor Policies. Website. Active Sensing: adding the error of physical properties estimation into reward function.
- RSS 2023, Demonstrating a Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning. Website, Code. Learning locomotion directly in real world, using SAC algorithms in Jax.
- IROS 2023, Hierarchical Adaptive Control for Collaborative Manipulation of a Rigid Object by Quadrupedal Robots. Paper.
- ICRA 2023, Legs as Manipulator: Pushing Quadrupedal AgilityBeyond Locomotion. Website. Use one front leg as manipulator. First train locomotion and manipulation policy respectively, and then learn a behavior tree from demonstration to stitch previous skills together.
- ICRA 2023, Hierarchical Adaptive Loco-manipulation Control for Quadruped Robots. Paper. An adaptive controller to solve the locomotion and manipulation tasks simultaneously. Use the position and velocity error to update the adaptive controller for manipulations.
- arXiv 2022.05, Bridging Model-based Safety and Model-free Reinforcement Learning through System Identification of Low Dimensional Linear Models. Paper. Dynamics of Cassie under RL policies can be seen as a low dimensional linear system.
- arXiv 2022.03, RoLoMa: Robust Loco-Manipulation for Quadruped Robots with Arms. Paper
- arXiv 2022.01, Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees. Paper. Safet-aware dual policy structure.
- ISRR 2022, Reference-Free Learning Bipedal Motor Skills via Assistive Force Curricula. Paper. Learning bipedal locomotion utilizing assistive force.
- CoRL 2022, Oral. Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior. Website, Github. Multiplicity of Behavior (MoB): learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways.
- RSS 2022, Rapid Locomotion via Reinforcement Learning. Website, Code. Implicit System Identification.
- IROS 2022, PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations. Paper. Predictive Information Representations: Learn an encoder to maximize predictive information (the mutual information between past and future.)
- IROS 2022, Adapting Rapid Motor Adaptation for Bipedal Robots. Paper. Further finetune the base policy
$\pi_1$ with the imperfect extrinsics predicted by the adaptation module$\phi$ . - RA-L 2022, Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion. Paper
- RA-L 2022, Combining Learning-Based Locomotion Policy With Model-Based Manipulation for Legged Mobile Manipulators. Paper. Decouple the manipulator control from base policy training by modeling the disturbances from the manipulator as predictable external wrenches.
- ⭐ Science Robotics 2022, Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild. Paper. Adding a belief state encoder based on attention mechanism, which can fuse perceptive information and proprioceptive information.
- IROS 2021, Adaptive Force-based Control for Legged Robots. Paper. L1 adaptive control law, force-based control.
- CoRL 2021, Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. Paper.
- RA-L 2020, Learning Fast Adaptation with Meta Strategy Optimization. Paper. Finding the latent representations as an optimization problem.
- Science Robotics, 2020, Multi-Expert Learning of Adaptive Legged Locomotion. Paper. Use gating neural network to learn the combination of expert skill networks.
- Science Robotics 2020, Learning Quadrupedal Locomotion over Challenging Terrain. Paper.
- arXiv 2020.04, Learning Agile Robotic Locomotion Skills by Imitating Animals. Paper, Code.
- IROS 2019, Sim-to-Real Transfer for Biped Locomotion. Paper. Pre-sysID and post-sysID.
- ICRA 2019, ALMA - Articulated Locomotion and Manipulation for a Torque-Controllable Robot. Paper. Track operational space motion and force references with a wholebody control algorithm that generates torque references for all the controllable joints by using hierarchical optimization.
- ⭐ RSS 2017, Preparing for the Unknown: Learning a Universal Policy with Online System Identification. Using an online system identification model to predict parameter
$\mu$ given history, and$\mu$ is the input to the actual policy. - ACC 2015, L1 Adaptive Control for Bipedal Robots with Control Lyapunov Function based Quadratic Programs. Paper.
- ICRA 2015, Whole-body Pushing Manipulation with Contact Posture Planning of Large and Heavy Object for Humanoid Robot. Paper . Generate pushing motion for humanoid robots, based on ZMP.
- arXiv 2024.10, DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning. Website. An automated data generation system for humanoids manipulation.
- arXiv 2024.10, EgoMimic | Scaling Imitation Learning through Egocentric Video. Website. Align human data and robot teleop data, and then do imitation learning.
- arXiv 2024.10, Multi-Task Interactive Robot Fleet Learning with Visual World Models. Website. First train a model to predict future trajectories, and then finetune using deployment data.,
- arXiv 2024.10, HuDOR: Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards. Website. Get object trajectories from videos, and calculate the reward based on the object's state.
- arXiv 2024.10, Robots Pre-Train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets. Website.
- arXiv 2024.10, Local Policies Enable Zero-shot Long Horizon Manipulation. Website. Distill ~1000 RL experts into a generalist policy. Use a variant of dagger for better performance.
- arXiv 2024.10, One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation. Website, Paper. Distill diffusion policy into a one-step action generator. Fomularize the gradient of KL divergence into a score-difference loss.
- arXiv 2024.10, M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes. Website. A benchmark for mobile manipulation in household scenes with many tasks.
- arXiv 2024.10, Discovering Robotic Interaction Modes with Discrete Representation Learning. Website. Self-supervised, Gaussian Mixture VAE. Splits the policy into a discrete mode selector and an action predictor.
- arXiv 2024.10, Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning. Website, Code. Real world RL with human in the loop corrections. Many designs made this data-efficeint.
- arXiv 2024.10, Combining Deep Reinforcement Learning with a Jerk-Bounded Trajectory Generator for Kinematically Constrained Motion Planning. Paper. Refine RL output actions to make it jerk-bounded.
- arXiv 2024.10, DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control. Website, Paper. RL policy output stiffness, and then passed into a QP solver to generate torque.
- arXiv 2024.10, MILES: Making Imitation Learning Easy with Self-Supervision. Paper. Automatic data collection and augmentation.
- arXiv 2024.10, Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation. Paper. Diffusion policy output Cartesian end-effector poses and arm stiffness.
- arXiv 2024,10, ForceMimic: Force-Centric Imitation Learning with Force-Motion Capture System for Contact-Rich Manipulation. Force capture using UMI and train a hybrid force-position control policy.
- arXiv 2024.10, DROP: Dexterous Reorientation via Online Planning. Website. Sampling-based online planning for dexterous manipulation.
- arXiv 2024.10, Adaptive Compliance Policy: Learning Approximate Compliance for Diffusion Guided Control. Website. Using diffusion policy as backbone, output a scalar value representing the stiffness manitude for compliance controller. Force sensors integrated.
- ⭐ arXiv 2024.10, Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control. Paper. Policy model output a sequence of actions.
- arXiv 2024.10, ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback. Website, Code. Using AR feedback when collecting human demonstrations.
- arXiv 2024.09, Hand-object interaction pretraining from videos. Website, Paper.
- arXiv 2024.09, Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation. Paper. Use a decoder to generate action from error embeddings.
- arXiv 2024.09, Adaptive Control based Friction Estimation for Tracking Control of Robot Manipulators. Paper. Adaptive control methods.
- arXiv 2024.09, Fast Payload Calibration for Sensorless Contact Estimation Using Model Pre-training. Paper.
- arXiv 2024.08, RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands. Website. A dataset built on RoboPianist with shadow hands.
- arXiv 2024.08, ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation. Website, Code. A teleoperation system.
- arXiv 2024.08, A Survey of Embodied Learning for Object-Centric Robotic Manipulation. Paper.
- arXiv 2024.08, Real-time Dexterous Telemanipulation with an End-Effect-Oriented Learning-based Approach. Paper. First using DDPG to train robots to follow operator's command offline, then test it online.
- arXiv 2024.03, CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundational Model. Website.
- CoRL 2024, 3D-ViTac Learning Fine-Grained Manipulation with Visuo-Tactile Sensing. Website.
- ⭐ RSS 2024, Dynamic On-Palm Manipulation via Controlled Sliding. Website, Code. Using hierarchical control methods: The system is modeled as LCS (Linear Complementarity Model), and then use C3 (Complementary Consensus Control) algorithms to solve. Low-level OSC tracking controller track the end-effector positions and force given by MPC.
- ⭐ RSS 2024, Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots. Website, Code. A data collection framework.
- RSS 2024, Learning Manipulation by Predicting Interaction. Website, Code.
- RSS 2024, Any-point Trajectory Modeling for Policy Learning. Website, Code. Utilize points tracking in human videos for policy learning.
- RA-L 2024, On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer, Arxiv. Benchmarked 13 action spaces in FRANKA manipulation skills learning.
- ⭐ CoRL 2023, AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer. Website, Code. Iterately real2sim sysID.
- CoRL 2023 Oral, VoxPoser: Composable 3D Value Map for Robotic Manipulation with Language Models. Website, Code. Utilizing LLM and VLM to write code, thus generating affordance maps and constraint maps in 3D scene.
- NeurIPS 2023 Spotlight, Learning Universal Policies via Text-Guided Video Generation. Website, Code, Openreview. Formulate the sequential decision making problem as a text-conditioned video generation problem.
- CoRL 2020, Transporter Networks: Rearranging the Visual World for Robotic Manipulation. Website. Learning pick-conditioned placing via transporting for robotics manipulation.
- IROS 2001, Adaptive force control of position/velocity controlled robots: theory and experiment. Paper. Popose 2 velocity based implicit force trajectory tracking controllers
- TMECH 1999, A Survey of Robot Interaction Control Schemes with Experimental Comparison. Paper.
- 1987, Dynamic Hybrid Position/Force Control of Robot Manipulators-Description of Hand Constraints and Calculation of Joint Driving Force. Paper.
- 1981, Hybrid Position/Force Control of Manipulators. Paper.
- arXiv 2024.09, Neural Fields in Robotics: A Survey. Website.
- arXiv 2024.10, DARE: Diffusion Policy for Autonomous Robot Exploration. Paper. One-step diffusion process for planning and exploration.
- arXiv 2018.10, Exploration by Random Network Distillation. Paper. RND for exploration.
- ⭐ ICML 2017, Curiosity-driven Exploration by Self-supervised Prediction. Website, Code. Formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model.
- arXiv 2020.12, Rethinking Bias-Variance Trade-off for Generalization of Neural Networks. Paper.The variance is unimodal or bell-shaped.
- arXiv 2024.10, Motion Planning for Robotics: A Review for Sampling-based Planners. Paper.
- arXiv 2024.09, Fine Manipulation Using a Tactile Skin: Learning in Simulation and Sim-to-Real Transfer. Paper.
- arXiv 2024.09, A Learning-based Quadcopter Controller with Extreme Adaptation. Paper. RMA for adaptation, combine BC and RL on drones.
- arXiv 2024.08, All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents. Website.
- arXiv 2024.08, Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation. Website, Code. Train a transformer policy for cross embodied robots by tokenizing observations and treating actions as readout tokens.
- arXiv 2024.07, MAGIC-VFM: Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models. Paper.
- arXiv 2024.05, Hierarchical World Models as Visual Whole-Body Humanoid Controllers. Website, Code.First train a low-level tracking model using MoCapAct using TD-MPC2, and then train skills on down-stream tasks.
- CoRL 2024, PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations. Website.
- IROS 2024m Robot Synesthesia: A Sound and Emotion Guided AI Painter. Website. Let a robot manipulator to paint something.
- arXiv 2024.02, Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation. Website, Code. A cross embodied transformer policy. Tokenize visual observations and generate actions through a conditional diffusion process.
- RSS 2024, RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots. Website, Code. A large-scale simulation framework, a lot of kitchens.
- ICLR 2024, Spotlight. TD-MPC2: Scalable, Robust World Models for Continuous Control. Website, Code. Adding some tricks on top of TD-MPC.
- ICLR 2024, RFCL: Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in RL. Website, Code.
- ICRA 2024, Safe Deep Policy Adaptation. Website, Code. Jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data.
- CoRL 2023 Oral, DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control. Website, Code. Use L1 adaptive control to estimate disturbance.
- NeurIPS 2023, Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning. Website, Code.
- CVPR 2022, Masked Autoencoders Are Scalable Vision Learners. Paper. Mask random patches of the input image and reconstruct the missing pixels.
- ICML 2022, Temporal Difference Learning for Model Predictive Control. Website, Code. Learning the dynamics model that are predictive of reward, and learning a terminal-value function by TD-learning. Use MPPI.
- arXiv 2016, Building Machines That Learn and Think Like People. Paper.
- NeurIPS 2016, Learning Physical Intuition of Block Towers by Example. Paper.