Skip to content

Conversation

@GerdsenAI-Admin
Copy link
Contributor

This pull request updates the documentation and Dockerfile to improve deployment, troubleshooting, and performance for the Depth Anything 3 ROS2 wrapper on NVIDIA Jetson Orin AGX. The changes clarify environment detection, deployment procedures, and architecture, and address compatibility issues with Jetson hardware and software. The Dockerfile is updated to fix OpenCV, PyTorch, and pip configuration issues for Jetson L4T r36.x. The documentation now provides clearer guidance for users and agents, and details current performance bottlenecks.

Deployment & Environment Setup Improvements

  • Expanded CLAUDE.md with detailed environment detection steps, SSH/MCP usage, and one-click Jetson deployment instructions, including preferred git-based deployment and troubleshooting for X11 GUI forwarding. [1] [2]
  • Added JetPack/L4T version notes, Docker build known issues, and host-container TensorRT architecture explanation, including file-based IPC details and performance status table. [1] [2]

Critical Design Principles & Testing

  • Clarified camera-agnostic design and ROS2 patterns as non-negotiable principles, and expanded testing section to highlight mocked model tests and camera-agnostic functionality. [1] [2]

Troubleshooting & Agent Guidance

  • Added troubleshooting section referencing key docs, and improved agent selection guidance with a detailed table and proactive usage instructions for Jetson and NVIDIA experts. [1] [2] [3] [4]

Dockerfile Compatibility Fixes

  • Updated base image selection to use humble-desktop for Jetson L4T r36.x, improved OpenCV version checks, and fixed pip configuration to use PyPI instead of unreliable Jetson servers. [1] [2] [3]
  • Skipped torchvision source build for Jetson, installing CPU-only torchvision from PyPI for host-container TRT architecture, and improved ROS2 workspace sourcing for non-interactive shells. [1] [2] [3]## Description

Please include a summary of the changes and the related issue. Include relevant motivation and context.

GerdsenAI-Admin and others added 20 commits February 3, 2026 00:36
Dockerfile: Ensure ROS2 workspace/setup sourcing is added before the PS1 guard in ~/.bashrc (using sed when the PS1 return line exists) so the setup runs for non-interactive shells (e.g. docker exec). Use the install/setup.bash path with fallback and add equivalent lines to /etc/bash.bashrc and /etc/profile.d/ros2.sh.

Code: Add atomic writes for numpy files in depth_anything_3_ros2/da3_inference.py and scripts/trt_inference_service.py (write to a temp file, flush, fsync, then rename) to prevent partial reads by the inference service. In trt_inference_service.py also validate the input tensor size against the engine's expected shape and raise a clear ValueError on mismatch. Uses allow_pickle=False for np.save to improve safety.
Add two scripts to run a live depth visualization demo: scripts/demo_depth_viewer.py (ROS2-based viewer showing side-by-side camera feed and colorized TensorRT depth, FPS toggle, frame save to demo_captures, and helper to start the TRT inference service) and scripts/run_demo.sh (convenience runner that starts the TRT service, camera driver, depth node in the da3_ros2_jetson container, and launches the viewer with X11). Notes: requires ROS2, a built TensorRT engine at models/tensorrt/da3-small-fp16.engine, a camera at /dev/video0, and a display (Jetson). The runner waits for the TRT service status file and cleans up processes on exit.
Stop auto-starting the TensorRT inference service from the viewer and instead just verify its status. demo_depth_viewer.py: replace start_trt_service() with check_trt_service() that inspects the shared status file and emits a warning if the service isn't present; remove process spawning/cleanup logic so the service is expected to be managed externally. Add scripts/jetson_demo.sh: new helper to run the full pipeline on a Jetson (starts TRT service on the host, prepares shared dir, starts container ROS nodes, and launches the viewer with X11). Update scripts/run_demo.sh: improve X11 access handling (xhost), handle SSH sessions by printing instructions and showing TRT stats instead of trying to open a GUI, and launch the viewer in-container with QT_X11_NO_MITSHM set when running locally. These changes decouple service lifecycle from the viewer and provide a dedicated Jetson entrypoint for systems with a local display.
Add Phase 5 'Live Demo System' section to TODO.md. Documents new demo components (scripts/demo_depth_viewer.py, scripts/run_demo.sh, scripts/jetson_demo.sh), atomic IO for numpy files, and a Dockerfile ROS2 sourcing fix. Lists demo features (side-by-side camera and colorized TensorRT depth, FPS toggle, frame capture, X11 with SSH fallback), usage examples for Jetson and container runs, and notes a pending merge of the TensorRT-Testing branch. Updates the Last Updated date to 2026-02-03.
…d known issues

- Update SSH commands to use -i ~/.ssh/jetson_j4012 identity file
- Add git clone as preferred deployment method (preserves history)
- Document deploy_jetson.sh script usage
- Add JetPack/L4T version compatibility table (r36.2.0 vs r36.4.0)
- Document Docker build known issues (pip.conf, OpenCV, cuDNN, base image)
…ecture

- Switch base image from humble-pytorch to humble-desktop (r36.x compatible)
- Remove dustynv pip.conf that uses unreliable jetson.webredirect.org
- Add OpenCV 4.10.x support for L4T r36.4.0
- Replace torchvision source build with CPU-only PyPI install
- Add explicit PyTorch dependencies (filelock, sympy, etc.)
…oyment

- Update Jetson demo to use git clone (preserves history)
- Add SSH identity file to example commands
- Add troubleshooting for humble-pytorch, pip.conf, cuDNN issues
Match container base image to host L4T R36.4.x environment.
Update to note humble-desktop is used because humble-pytorch doesn't exist for r36.x.
Removed redundant demo scripts:
- scripts/deploy_jetson.sh (merged into run.sh)
- scripts/jetson_demo.sh (merged into run.sh)
- scripts/run_demo.sh (merged into run.sh)

Fixed TRT inference service race condition:
- Handle empty REQUEST_PATH file during atomic write
- Make REQUEST_PATH write atomic in container side
- Prevents "could not convert string to float" errors

Updated scripts/demo.sh with deprecation notice pointing to run.sh

Remaining scripts (11 total):
- Setup: install_dependencies.sh, setup_models.py
- Core: trt_inference_service.py, build_tensorrt_engine.py
- Utilities: detect_cameras.sh, performance_monitor.sh
- Viewer: demo_depth_viewer.py
- Testing: benchmark_models.sh, test_trt10.3_host.sh, thermal_stability_test.sh
- Legacy: demo.sh (deprecated)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update TensorRT Demo section to use ./run.sh
- Update Quick Start to use ./run.sh instead of deploy_jetson.sh
- Update Key Files table
- Simplify demo script options documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add video group, /dev mount, and device cgroup rule for proper
v4l2 camera access. Fixes 'Failed mapping device memory' error.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add quick reference block at top of file with host, user, and identity file info for easy SSH access to Jetson device.
- Add blank lines around headings (MD022)
- Remove trailing period from heading (MD026)
- Add blank lines around code fences (MD031)
- Add blank lines around lists (MD032)
- Fix table column alignment (MD060)
@GerdsenAI-Admin GerdsenAI-Admin merged commit 4337a74 into main Feb 4, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants