This project enables real-time autonomous object detection, 3D localization, and orientation estimation for robotic pick-and-place applications. It uses the Intel RealSense D435i depth camera and Meta's Segment Anything Model (SAM) for robust object segmentation. It computes the 3D position, orientation, and approach direction of segmented objects, making it suitable for robotic manipulation tasks in industrial and research settings.
- Real-time object segmentation using Meta's Segment Anything Model (SAM)
- 3D localization of object centers from RGB + depth data (Intel RealSense)
- Orientation estimation using PCA (Principal Component Analysis)
- Robot approach direction calculated using rotation matrices
- Live visualization:
- Annotated RGB frame with object info
- Segmentation + contours overlay
- Pseudo-colored depth map
- Python 3.8+
- NVIDIA GPU (for SAM inference)
- Intel RealSense SDK 2.0
- Segment Anything Model (SAM)
pip install -r requirements.txtNote: Install SAM and its dependencies from Segment Anything GitHub
-
Install the segment-anything-py library.
-
Download the ViT-Large model checkpoint from the official Segment Anything release page.
-
Update the sam_checkpoint path in the code:
sam_checkpoint = "path/to/sam_vit_l_0b3195.pth"Streams RGB and depth frames from Intel RealSense D435i.
Segments objects using SamAutomaticMaskGenerator.
- Extracts one object mask
- Converts it to a grayscale image
- Draws object contour
- Converts the 2D center to 3D world coordinates using RealSense intrinsics.
- Uses PCA on depth points to estimate object orientation
- Converts to Euler angles → Quaternion
- Calculates robot’s approach direction using combined rotation matrix
- Segmentation overlay on RGB frame
- Mask with object contours
- Colorized depth map
- Real-time console output:
World point: [-0.03, 0.12, 0.67] Quaternion: [0.12, 0.45, 0.21, 0.87] Approach direction: [0.92, -0.01, 0.39]
You may download the mp4 video to get the better quality version.