Autonomous Pick-and-Place System

This project enables real-time autonomous object detection, 3D localization, and orientation estimation for robotic pick-and-place applications. It uses the Intel RealSense D435i depth camera and Meta's Segment Anything Model (SAM) for robust object segmentation. It computes the 3D position, orientation, and approach direction of segmented objects, making it suitable for robotic manipulation tasks in industrial and research settings.

Key Features

Real-time object segmentation using Meta's Segment Anything Model (SAM)
3D localization of object centers from RGB + depth data (Intel RealSense)
Orientation estimation using PCA (Principal Component Analysis)
Robot approach direction calculated using rotation matrices
Live visualization:
- Annotated RGB frame with object info
- Segmentation + contours overlay
- Pseudo-colored depth map

Requirements

Python 3.8+
NVIDIA GPU (for SAM inference)
Intel RealSense SDK 2.0
Segment Anything Model (SAM)

Install Dependencies

pip install -r requirements.txt

Note: Install SAM and its dependencies from Segment Anything GitHub

Segment Anything Setup

Install the segment-anything-py library.
Download the ViT-Large model checkpoint from the official Segment Anything release page.
Update the sam_checkpoint path in the code:

sam_checkpoint = "path/to/sam_vit_l_0b3195.pth"

How It Works

1. RealSense Frame Capture

Streams RGB and depth frames from Intel RealSense D435i.

2. Object Segmentation with SAM

Segments objects using SamAutomaticMaskGenerator.

3. Mask Processing

Extracts one object mask
Converts it to a grayscale image
Draws object contour

4. 3D Position Estimation

Converts the 2D center to 3D world coordinates using RealSense intrinsics.

5. Orientation Estimation

Uses PCA on depth points to estimate object orientation
Converts to Euler angles → Quaternion
Calculates robot’s approach direction using combined rotation matrix

Visual Outputs

Segmentation overlay on RGB frame
Mask with object contours
Colorized depth map

Real-time console output:

World point: [-0.03, 0.12, 0.67]
Quaternion: [0.12, 0.45, 0.21, 0.87]
Approach direction: [0.92, -0.01, 0.39]

🎥 Demo

▶️ Watch the demo video

You may download the mp4 video to get the better quality version.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
README.md		README.md
requirements.txt		requirements.txt
stereoVision5.py		stereoVision5.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Autonomous Pick-and-Place System

Key Features

Requirements

Install Dependencies

Segment Anything Setup

How It Works

1. RealSense Frame Capture

2. Object Segmentation with SAM

3. Mask Processing

4. 3D Position Estimation

5. Orientation Estimation

Visual Outputs

🎥 Demo

Credits

About

Uh oh!

Releases

Packages

Languages

juzztjawa/Autonomous-pick-and-place

Folders and files

Latest commit

History

Repository files navigation

Autonomous Pick-and-Place System

Key Features

Requirements

Install Dependencies

Segment Anything Setup

How It Works

1. RealSense Frame Capture

2. Object Segmentation with SAM

3. Mask Processing

4. 3D Position Estimation

5. Orientation Estimation

Visual Outputs

🎥 Demo

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages