- 标题:用于深部目标姿态估计的快速不确定性量化
- 作者团队:加州理工学院 & NVIDIA & 德克萨斯大学
Deep learning-based object pose estimators are often unreliable and overconfident especially when the input image is outside the training domain, for instance, with sim2real transfer. Efficient and robust uncertainty quantification (UQ) in pose estimators is critically needed in many robotic tasks. In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation. We ensemble 2-3 pre-trained models with different neural network architectures and/or training data sources, and compute their average pairwise disagreement against one another to obtain the uncertainty quantification. We propose four disagreement metrics, including a learned metric, and show that the average distance (ADD) is the best learning-free metric and it is only slightly worse than the learned metric, which requires labeled target data. Our method has several advantages compared to the prior art: 1) our method does not require any modification of the training process or the model inputs; and 2) it needs only one forward pass for each model. We evaluate the proposed UQ method on three tasks where our uncertainty quantification yields much stronger correlations with pose estimation errors than the baselines. Moreover, in a real robot grasping task, our method increases the grasping success rate from 35% to 90%.
基于深度学习的目标姿态估计器通常不太可靠且过于自信,尤其是当输入图像在训练域之外时(例如,使用sim2real传输)。在大多数机器人任务中,都需对姿态估计器中进行要有效的不确定性量化(UQ)。在这项工作中,我们提出了一种简单且高效的UQ方法用于6自由度目标姿态估计。我们将具有不同神经网络架构和/或训练数据源的2-3个预训练模型集成在一起,并计算它们之间的平均成对不一致,以获得不确定性量化。我们提出了四个不同的度量标准,包括一个学习的度量标准,并表明平均距离(ADD)是最好的免学习度量标准,它仅比需要标记目标数据的学习度量标准稍差。与现有技术相比,我们的方法具有几个优点:1)我们的方法不需要对训练过程或模型输入进行任何修改;2)每个模型只需要一个前向传播。我们在三个任务上评估提出的UQ方法,在这些任务中,不确定性量化产生的姿势估计误差相关性比基线强得多。此外,在实际的机器人抓取任务中,我们的方法将抓取成功率从35%提高到90%。
Paper: [arXiv] | Code: [GitHub]
ICRA2021 REDE: End-to-end Object 6D Pose Robust Estimation Using Differentiable Outliers Elimination
- 标题:REDE:端到端6D目标位姿稳健估计网络
- 作者团队:浙江大学(ZJU-Robotics Lab) & 北京理工大学
Object 6D pose estimation is a fundamental task in many applications. Conventional methods solve the task by detecting and matching the keypoints, then estimating the pose. Recent efforts bringing deep learning into the problem mainly overcome the vulnerability of conventional methods to environmental variation due to the hand-crafted feature design. However, these methods cannot achieve end-to-end learning and good interpretability at the same time. In this paper, we propose REDE, a novel end-to-end object pose estimator using RGB-D data, which utilizes network for keypoint regression, and a differentiable geometric pose estimator for pose error back-propagation. Besides, to achieve better robustness when outlier keypoint prediction occurs, we further propose a differentiable outliers elimination method that regresses the candidate result and the confidence simultaneously. Via confidence weighted aggregation of multiple candidates, we can reduce the effect from the outliers in the final estimation. Finally, following the conventional method, we apply a learnable refinement process to further improve the estimation. The experimental results on three benchmark datasets show that REDE slightly outperforms the state-of-the-art approaches and is more robust to object occlusion.
在许多机器人应用中,6D目标姿态估计通常作为一项重要任务。常规方法通过检测和匹配关键点,然后估计姿势来解决任务。深入研究这一问题的最新工作主要是克服了传统方法由于手工制作的特征设计而易受环境变化的影响。但是,这些方法无法同时实现端到端学习和良好的可解释性。在本文中,我们提出了REDE,这是一种使用RGB-D数据的新型端到端目标姿态估计器,该模型利用网络进行关键点回归,并使用可微分几何姿态估计器进行姿态误差反向传播。此外,为了在发生异常关键点预测时获得更好的鲁棒性,我们进一步提出了一种可微分的异常值消除方法,该方法同时对候选结果和置信度进行回归。通过对多个候选者进行置信度加权聚合,我们可以减少最终估计中离群值的影响。最后,遵循常规方法,我们应用了可学习的优化过程来进一步改进估计。在三个基准数据集上的实验结果表明,REDE的性能略优于最新方法,并且对遮挡目标的鲁棒性更好。
Paper: [arXiv] | Code: [GitHub]
- 标题:面向参数化零件堆叠场景的6D位姿估计网络
- 作者团队:清华大学
To our knowledge, current pose estimation methods are mainly for non-parametric objects. Thus, this paper proposes a new 6DoF pose estimation network for parametric shapes with keypoint learning and Hough Voting scheme (ParametricNet). Firstly, keypoints are defined by the driven parameters and symmetry of the template. Through the point-wise regression network and voting scheme, individual keypoints and the centroid are predicted, and instance segmentation is achieved in the centroid space. At the same time, a 3D reconstruction of the corresponding template part is achieved and determines the template’s centroid and keypoints. Finally, the 6DoF pose of each individual instance is calculated through keypoint fitting. ParametricNet outperforms the state-of-the-art method by 15% in the average APs (average precision) on the non-parametric pose estimation dataset. ParametricNet also shows excellent learning and generalization ability on our parametric dataset. In the real-world grasping experiment, it was able to stably realize the recognition and pose estimation of unseen parametric parts.
本文提出了一种基于关键点预测和霍夫投票的参数化零件位姿估计网络ParametricNet。首先由模板的驱动参数和对称性定义出与参数关联的关键点,即将物体参数预测问题和位姿预测问题转变为关键点预测问题。然后通过逐点回归网络与投票机制预测出个体的关键点和质心,并在质心空间内实现个体分割,同时由预测质心和关键点解算出参数,实现对应模板零件实例的三维重构,进而确定模板质心和关键点。最后,通过关键点匹配拟合解算出每个物体的6D位姿。已位姿识别非参数化标准数据集中,ParametricNet在位姿估计准确率上大幅优于最新方法(达15%)。另外,本文还构建了一个参数化零件堆叠数据集,为工业零件场景理解技术研究奠定大规模数据基础。ParametricNet在本文提出的参数化零件堆叠数据集中也同样具有优秀的学习能力和泛化能力,在机械臂堆叠抓取的实际场景实验中,ParametricNet可以稳定地实现对未知参数零件的识别和位姿估计。
Paper: [HomePage]
ICRA2021 Investigations on Output Parameterizations of Neural Networks for Single Shot 6D Object Pose Estimation
- 标题:单阶段 6D 物体姿态估计的神经网络输出参数化研究
- 作者团队:弗劳恩霍夫制造工程研究所 & 斯图加特大学
Single shot approaches have demonstrated tremendous success on various computer vision tasks. Finding good parameterizations for 6D object pose estimation remains an open challenge. In this work, we propose different novel parameterizations for the output of the neural network for single shot 6D object pose estimation. Our learning-based approach achieves state-of-the-art performance on two public benchmark datasets. Furthermore, we demonstrate that the pose estimates can be used for real-world robotic grasping tasks without additional ICP refinement.
单阶段方法在各种计算机视觉任务上取得了巨大的成功。在6D目标姿态估计中,找到好的参数化方法仍然是一个具有挑战的工作。 在这项工作中,我们为用于单阶段6D目标姿态估计的神经网络输出提出了新颖的参数化方法。 在基于学习的方法的两个公开基准数据集上展示了最好的性能表现。 此外,我们证明姿态估计可用于现实世界的机器人抓取任务,而无需额外的 ICP 细化工作。
Paper: [arXiv]
- 标题:基于点云的在线数据合成学习6D物体姿态回归
- 作者团队:汉堡大学 & 清华大学
It is often desired to train 6D pose estimation systems on synthetic data because manual annotation is expensive. However, due to the large domain gap between the synthetic and real images, synthesizing color images is expensive. In contrast, this domain gap is considerably smaller and easier to fill for depth information. In this work, we present a system that regresses 6D object pose from depth information represented by point clouds, and a lightweight data synthesis pipeline that creates synthetic point cloud segments for training. We use an augmented autoencoder (AAE) for learning a latent code that encodes 6D object pose information for pose regression. The data synthesis pipeline only requires texture-less 3D object models and desired viewpoints, and it is cheap in terms of both time and hardware storage. Our data synthesis process is up to three orders of magnitude faster than commonly applied approaches that render RGB image data. We show the effectiveness of our system on the LineMOD, LineMOD Occlusion, and YCB Video datasets.
因为手动标注非常耗费资源,所以目前许多工作都基于合成数据训练 6D 姿态估计网络。然而,由于合成图像和真实图像之间的域差距很大,合成逼真彩色图像的成本很高。相比之下,我们的工作在域差距要小得多,而且更容易填充深度信息。我们提出了一个系统,该系统从点云表示的深度信息中回归 6D 目标姿态,以及一个轻量级数据合成通道,该通道创建合成点云片段以进行训练。我们使用增强自编码器 (AAE) 来学习潜码,该潜码为 6D 目标姿态信息的姿态回归编码。数据合成通道只需要无纹理的 3D 对象模型和所需的视点,并且在时间和硬件存储方面都很轻量。我们的数据合成过程比渲染 RGB 图像数据的常用方法快三个数量级。我们展示了我们的方法在 LineMOD、LineMOD Occlusion 和 YCB Video 数据集上的有效性。
- 标题:学习单目RGBD图像上的7-DOF抓握姿势
- 作者团队:上海交通大学
General object grasping is an important yet unsolved problem in the field of robotics. Most of the current methods either generate grasp poses with few DoF that fail to cover most of the success grasps, or only take the unstable depth image or point cloud as input which may lead to poor results in some cases. In this paper, we propose RGBD-Grasp, a pipeline that solves this problem by decoupling 7-DoF grasp detection into two sub-tasks where RGB and depth information are processed separately. In the first stage, an encoder-decoder like convolutional neural network Angle-View Net(AVN) is proposed to predict the SO(3) orientation of the gripper at every location of the image. Consequently, a Fast Analytic Searching(FAS) module calculates the opening width and the distance of the gripper to the grasp point. By decoupling the grasp detection problem and introducing the stable RGB modality, our pipeline alleviates the requirement for the high-quality depth image and is robust to depth sensor noise. We achieve state-of-the-art results on GraspNet-1Billion dataset compared with several baselines. Real robot experiments on a UR5 robot with an Intel Realsense camera and a Robotiq two-finger gripper show high success rates for both single object scenes and cluttered scenes. Our code and trained model will be made publicly available.
目标抓取是机器人领域一个重要并具有挑战的任务。 当前的大多数方法要么生成具有很少 DoF 的抓取姿势,无法覆盖大部分成功抓取,要么仅将不稳定的深度图像或点云作为输入,这在某些情况下可能会导致较差的结果。 在本文中,我们提出了 RGBD-Grasp,这是一种通过将 7-DoF 抓取检测解耦为两个子任务,分别处理 RGB 和深度信息来解决这个问题的通道。 在第一阶段,提出了像卷积神经网络 Angle-View Net(AVN) 这样的编码器-解码器来预测图像每个位置的抓手的 SO(3) 方向。 因此,快速分析搜索 (FAS) 模块计算开口宽度和抓手到抓点的距离。 通过解耦抓取检测问题并引入稳定的 RGB 模态,我们的方法减轻了对高质量深度图像的要求,并且对深度传感器噪声具有鲁棒性。 与几个 baseline 相比,我们在 GraspNet-1Billion 数据集上取得了最好的成绩。 在配备 realsence 摄像头和 Robotiq 双指抓手的 UR5 机器人上进行的真实机器人实验表明无论是单个物体场景还是杂乱场景,都具有很高的成功率。
Paper: [arXiv] | Code: [GitHub](coming soon)
- 标题:Contact-GraspNet:在无约束复杂场景的中高效 6-DoF 抓取点生成
- 作者团队:NVIDIA & DLR & TUM
Grasping unseen objects in unconstrained, cluttered environments is an essential skill for autonomous robotic manipulation. Despite recent progress in full 6-DoF grasp learning, existing approaches often consist of complex sequential pipelines that possess several potential failure points and run-times unsuitable for closed-loop grasping. Therefore, we propose an end-to-end network that efficiently generates a distribution of 6-DoF parallel-jaw grasps directly from a depth recording of a scene. Our novel grasp representation treats 3D points of the recorded point cloud as potential grasp contacts. By rooting the full 6-DoF grasp pose and width in the observed point cloud, we can reduce the dimensionality of our grasp representation to 4-DoF which greatly facilitates the learning process. Our class-agnostic approach is trained on 17 million simulated grasps and generalizes well to real world sensor data. In a robotic grasping study of unseen objects in structured clutter we achieve over 90% success rate, cutting the failure rate in half compared to a recent state-of-the-art method.
无约束、复杂环境中抓取未知物体是机器人自主操作的一项重要技能。尽管最近很多研究在 6-DoF 抓取学习方面取得了进展,但现有方法通常由复杂的顺序流程组成,这些pipeline具有多个潜在的问题和不适合闭环抓取的时效。因此,我们提出了一种端到端网络,它可以直接从场景的深度记录中有效地生成 6-DoF 平行抓取的分布。我们提出的抓握表示将记录点云的 3D 点视为潜在的抓握接触点。通过在观察到的点云中建立完整的 6-DoF 抓取姿势和目标宽度,我们可以将我们的抓取表征的维度降低到 4-DoF,这极大地促进了学习过程。我们的方法在 1700 万次模拟抓取上进行了训练,并且可以很好地泛化到现实世界的传感器数据。在对结构化复杂环境未知物体的机器人抓取研究中,我们取得了超过 90% 的成功率,与最近最好的方法相比,失败率降低了一半。
Paper: [arXiv] | Code: [GitHub]
CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth
Interval-Based Visual-LiDAR Sensor Fusion
OmniDet: Surround View Cameras Based Multi-Task Visual Perception Network for Autonomous Driving
VIODE: A Simulated Dataset to Address the Challenges of Visual-Inertial Odometry in Dynamic Environments
[Winner] How to Select and Use Tools? : Active Perception of Target Objects Using Multimodal Deep Learning
Learning Task Space Actions for Bipedal Locomotion
Learning Sampling Distributions Using Local 3D Workspace Decompositions for Motion Planning in High Dimensions
Auto-Tuned Sim-To-Real Transfer
[Winner] Reactive Human-To-Robot Handovers of Arbitrary Objects
Can I Pour into It? Robot Imagining Open Containability Affordance of Previously Unseen Objects Via Physical Simulations
Collision Detection, Identification, and Localization on the DLR SARA Robot with Sensing Redundancy
Automated Acquisition of Structured, Semantic Models of Manipulation Activities from Human VR Demonstration
[Winner] Soft Hybrid Aerial Vehicle Via Bistable Mechanism
A Versatile Inverse Kinematics Formulation for Retargeting Motions Onto Robots with Kinematic Loops
Multi-Point Orientation Control of Discretely-Magnetized Continuum Manipulators
Surface Robots based on S-Isothermic Surfaces
[Winner] Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery
Integrated Voluntary-Reactive Control of a Human-SuperLimb Hybrid System for Hemiplegic Patient Support
Autonomous Robotic Suction to Clear the Surgical Field for Hemostasis Using Image-Based Blood Flow Detection
A Fluidic Soft Robot for Needle Guidance and Motion Compensation in Intratympanic Steroid Injections
[Winner] Optimal Sequential Stochastic Deployment of Multiple Passenger Robots
Self-Organized Evasive Fountain Maneuvers with a Bioinspired Underwater Robot Collective
Learning Multi-Arm Manipulation through Collaborative Teleoperation
Vision-Based Self-Assembly for Modular Multirotor Structures
[Winner] StRETcH: A Soft to Resistive Elastic Tactile Hand
A Parallelized Iterative Algorithm for Real-Time Simulation of Long Flexible Cable Manipulation
KPAM 2.0: Feedback Control for Category-Level Robotic Manipulation
Policy Blending and Recombination for Multimodal Contact-Rich Tasks
[Winner] Compact Flat Fabric Pneumatic Artificial Muscle (ffPAM) for Soft Wearable Robotic Devices
Tactile SLAM: Real-time Inference of Shape and Pose from Planar Pushing
Robotic Guide Dog: Leading a Human with Leash-Guided Hybrid Physical Interaction
BADGR: An Autonomous Self-Supervised Learning-Based Navigation System
[Winner] Aerial Manipulator Pushing a Movable Structure Using a DOB-Based Robust Controller
Pylot: A Modular Platform for Exploring Latency-Accuracy Tradeoffs in Autonomous Vehicles
Motor and Perception Constrained NMPC for Torque-Controlled Generic Aerial Vehicles
Dynamically Feasible Task Space Planning for Underactuated Aerial Manipulators
[Winner] Unsupervised Learning of Lidar Features for Use in a Probabilistic Trajectory Estimator
Unified Multi-Modal Landmark Tracking for Tightly Coupled Lidar-Visual-Inertial Odometry
Planning with Attitude
Cascaded Filtering Using the Sigma Point Transformation