SplitTracr: An Experimental Test-bed for Cooperative Inference using Split Computing

An experimental framework for distributed AI experiments, enabling split inference between server and host devices. tracr allows you to distribute deep learning model computations across multiple devices, optimizing resource utilization and enabling edge computing scenarios. It has the flexibility to allow you to perform cooperative inference using different deep learning models on different type of devices, with automatic network management and experiment coordination.

Warning

tracr is currently an experimental framework intended to explore distributed AI inference patterns. While functional, it is primarily for research and educational purposes.

Quick Start: Run AlexNet Split Inference

Caution

🔑 Required Prerequisites

You MUST have:

✅ Two devices (server and edge) on the same network
✅ SSH access between devices
✅ Python 3.10+ on both devices
✅ CUDA support on server (optional but recommended)

1️⃣ Clone and Install

# On both devices (REQUIRED)
git clone https://github.com/nbovee/tracr.git
cd tracr
python3 -m venv venv
source venv/bin/activate  # Linux/Mac/WSL
pip install -r requirements.txt

2️⃣ Configure Devices (REQUIRED)

# In the config/ directory (MANDATORY)
cp devices_template.yaml devices_config.yaml

Edit devices_config.yaml (MUST configure correctly):

devices:
  - device_type: SERVER           # REQUIRED
    connection_params:
      - host: 192.0.0.123         # ⚠️ Your server's IP (REQUIRED)
        user: user1               # ⚠️ Your username (REQUIRED)
        pkey_fp: server.rsa       # ⚠️ Will create this key next (REQUIRED)
        port: 12345               # ⚠️ Communication port (REQUIRED)
        default: true

  - device_type: PARTICIPANT     # REQUIRED
    connection_params:
      - host: 192.0.0.124        # ⚠️ Your edge device's IP (REQUIRED)
        user: user2              # ⚠️ Your username (REQUIRED)
        pkey_fp: edge.rsa        # ⚠️ Will create this key next (REQUIRED)
        port: 12345              # ⚠️ Must match server's port (REQUIRED)
        default: true

3️⃣ Set Up SSH Keys (REQUIRED)

# Create keys directory (MANDATORY)
mkdir -p config/pkeys/

# On Server (192.0.0.124) (REQUIRED)
ssh-keygen -t rsa -b 4096 -f ~/.ssh/server_key
ssh-copy-id -i ~/.ssh/server_key.pub user2@192.0.0.123
cp ~/.ssh/server_key config/pkeys/server.rsa

# On Edge Device (192.0.0.123) (REQUIRED)
ssh-keygen -t rsa -b 4096 -f ~/.ssh/edge_key
ssh-copy-id -i ~/.ssh/edge_key.pub user1@192.0.0.124
cp ~/.ssh/edge_key config/pkeys/edge.rsa

# Set permissions on both devices (CRITICAL)
chmod 600 config/pkeys/*.rsa

4️⃣ Prepare Data Directory (REQUIRED)

# Create data structure (MANDATORY)

# For the alexnetsplit.yaml, the images should be in:
mkdir -p data/imagenet/sample_images

# Copy your test images (REQUIRED - any JPEG images)
cp /path/to/your/images/* data/imagenet/sample_images/

# Download ImageNet class names (REQUIRED)
wget -O data/imagenet/imagenet_classes.txt https://raw.githubusercontent.com/pytorch/pytorch/master/torch/hub/imagenet_classes.txt

5️⃣ Run the Experiment

# On Server (192.0.0.124) - MUST START FIRST
python server.py

# On Edge Device (192.0.0.123) - START AFTER SERVER
python host.py --config config/alexnetsplit.yaml

That's it! You should now see the split inference running between your devices. For detailed setup instructions, troubleshooting, and advanced configurations, continue reading below.

Key Features

Split Inference: Distribute model computations between server and edge devices
Adaptive Partitioning: Automatically determine optimal split points based on device capabilities
Multiple Model Support: Pre-configured support for torchvision models and ultralytics YOLO models
Custom Extensions: Easy integration of custom models and datasets

Install

Requires Python 3.10+

# Clone repository
git clone https://github.com/nbovee/tracr.git
cd tracr

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # Linux/Mac/WSL

# Install dependencies
pip install -r requirements.txt

Prerequisites

System Requirements

Python 3.10 or higher
SSH client and server (openssh-client and openssh-server)
CUDA toolkit (for GPU support)

Software Installation

For Linux/Ubuntu:

# Update package list
sudo apt update

# Install SSH client and server
sudo apt install openssh-server openssh-client

# Install CUDA toolkit (if using GPU)
# Visit https://developer.nvidia.com/cuda-downloads for latest instructions

For Windows:

Install OpenSSH:
- Open Settings > Apps > Optional Features
- Add OpenSSH Client and OpenSSH Server
- Or follow Microsoft's OpenSSH guide
Install WSL2 (if needed):
```
wsl --install
```

Quick Start Guide

1. Basic Setup

tracr can be run in two modes: distributed (server-host) or local.

Option A: Distributed Mode (Server-Host)

Run the experiment across two devices:

On the server machine:

python server.py

On the host machine:

python host.py --config config/alexnetsplit.yaml

Option B: Local Mode

Run the entire experiment on a single device:

python server.py --local --config config/alexnetsplit.yaml

2. Pre-configured Experiments

We provide ready-to-use configurations for common scenarios:

Classification Models

# Run AlexNet split inference
python host.py --config config/alexnetsplit.yaml

# Run ResNet split inference
python host.py --config config/resnetsplit.yaml

# Run VGG split inference
python host.py --config config/vggsplit.yaml

# Run EfficientNet split inference
python host.py --config config/efficientnet_split.yaml

# Run MobileNet split inference
python host.py --config config/mobilenetsplit.yaml

Object Detection Models

# Run YOLOv8 split inference
python host.py --config config/yolov8split.yaml

# Run YOLOv5 split inference
python host.py --config config/yolov5split.yaml

Note

Each configuration file contains optimized settings for the specific model and dataset combination. You can use these as templates for creating your own configurations.

Detailed Setup Guide

1. Project Structure

tracr/
├── config/                # Configuration files
│   ├── pkeys/             # SSH keys directory
│   ├── devices_config.yaml
│   └── *split.yaml        # Model configurations
├── data/                  # Dataset storage
├── src/                   # Source code
│   ├── api/               # Core API components
│   ├── experiment_design/ # Experiment implementations
│   ├── interface/         # API bridges
│   └── utils/             # Utility functions
├── tests/                 # Test suite
├── host.py                # Host device entry point
└── server.py              # Server entry point

2. Device Configuration

A. SSH Key Setup

Generate and exchange SSH keys between devices:

# On Server Device
ssh-keygen -t rsa -b 4096 -f ~/.ssh/server_key
# Enter passphrase (optional)
ssh-copy-id -i ~/.ssh/server_key.pub user@participant_ip

# On Participant Device
ssh-keygen -t rsa -b 4096 -f ~/.ssh/participant_key
ssh-copy-id -i ~/.ssh/participant_key.pub user@server_ip

B. Key Installation

# Create keys directory
mkdir -p config/pkeys/

# Copy private keys
cp ~/.ssh/server_key config/pkeys/server_to_participant.rsa
cp ~/.ssh/participant_key config/pkeys/participant_to_server.rsa

# Set proper permissions
chmod 600 config/pkeys/*.rsa

C. Device Configuration

Create config/devices_config.yaml:

devices:
  - device_type: SERVER
    connection_params:
      - host: <server_ip>        # e.g., 192.168.1.100
        user: <username>         # your SSH username
        pkey_fp: server_key.rsa
        default: true

  - device_type: PARTICIPANT
    connection_params:
      - host: <participant_ip>   # e.g., 192.168.1.101
        user: <username>
        pkey_fp: participant_key.rsa
        default: true

Windows WSL Setup

Note

Required only for Windows users running tracr through WSL.

Click to expand WSL setup instructions

1. WSL Network Configuration

Check your WSL network mount:

mount | grep '^C:'

Configure WSL in /etc/wsl.conf:

[automount]
enabled = true
options = "metadata,umask=22,fmask=11"

2. Port Forwarding Setup

Run in PowerShell as Administrator:

# Get WSL IP address
wsl hostname -I

# Set up port forwarding
netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=22 connectaddress=<wsl_ip> connectport=22
netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=12345 connectaddress=<wsl_ip> connectport=12345

# Configure firewall
New-NetFirewallRule -DisplayName "WSL SSH Port 22" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 22
New-NetFirewallRule -DisplayName "WSL SSH Port 12345" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 12345

3. SSH Service

sudo service ssh restart

Extending `tracr`

Adding Custom Models

There are several ways to add custom models to tracr:

1. Using the Model Registry Decorator

The simplest way is to use the @ModelRegistry.register decorator:

from torch import nn, Tensor
from typing import Dict, Any
from experiment_design.models.registry import ModelRegistry

@ModelRegistry.register("my_custom_model")
class MyCustomModel(nn.Module):
    def __init__(self, model_config: Dict[str, Any], **kwargs) -> None:
        super().__init__()
        # Your model initialization
        self.model = nn.Sequential(
            # Your model layers
        )

    def forward(self, x: Tensor) -> Tensor:
        return self.model(x)

2. Adding Custom Post-Processing

You can either use pre-defined processors or create custom ones in src/api/inference_utils.py:

from src.api.inference_utils import ModelProcessor, ModelProcessorFactory

# Option 1: Use pre-defined processors
# For classification models (ImageNet-style):
"model_name": "my_classification_model"  # Will use ImageNetProcessor

# For detection models (YOLO-style):
"model_name": "my_detection_model"  # Will use YOLOProcessor

# Option 2: Create custom processor
class MyCustomProcessor(ModelProcessor):
    def __init__(self, class_names: List[str], vis_config: VisualizationConfig):
        self.class_names = class_names
        self.vis_config = vis_config

    def process_output(self, output: torch.Tensor, original_size: Tuple[int, int]) -> Any:
        # Your custom processing logic
        return processed_result

    def visualize_result(self, image: Image.Image, result: Any) -> Image.Image:
        # Your custom visualization logic
        return annotated_image

# Register your processor
ModelProcessorFactory._PROCESSORS.update({
    "my_model": MyCustomProcessor
})

3. Adding Pre-trained Model Support

To add support for pre-trained weights and dataset-specific configurations, update the mappings in src/experiment_design/models/templates.py:

# Add dataset-specific weights
DATASET_WEIGHTS_MAP.update({
    "my_dataset": "MY_DATASET_WEIGHTS_V1"
})

# Add model-specific weights for different datasets
MODEL_WEIGHTS_MAP.update({
    "my_custom_model": {
        "my_dataset": "MY_DATASET_WEIGHTS_V1",
        "imagenet": "IMAGENET1K_V1"
    }
})

# Add head type mapping if your model has a custom classification head
MODEL_HEAD_TYPES.update({
    "my_head_attr": ["my_custom_model"]
})

4. Configuration File

Create a configuration file for your model in config/:

# config/my_custom_split.yaml
model:
  model_name: my_custom_model
  pretrained: true
  weight_path: path/to/weights.pt  # Optional
  input_size: [3, 224, 224]
  split_layer: 5
  num_classes: 10  # Will automatically adjust the model head

dataset:
  module: my_dataset
  class: MyDataset
  args:
    root: data/my_dataset

5. Using External Model Libraries

For models from popular libraries (torchvision, ultralytics, etc.), you can use them directly by specifying the model name in the config:

model:
  model_name: resnet50  # or yolov8s, vit_b_16, etc.
  pretrained: true
  num_classes: 10  # Will automatically adjust the model architecture

The framework will:

Load the appropriate pre-trained weights
Adjust the model architecture for your dataset
Handle different PyTorch versions
Provide proper logging
Use appropriate post-processing based on model type

Note

Custom models should inherit from nn.Module
The model_config parameter in __init__ is required
The registry supports automatic head adjustment for different numbers of classes
Pre-trained weight handling is automatic if configured in templates.py
Post-processing is handled automatically for common model types (classification, detection)
Custom post-processing can be added by extending ModelProcessor class

Adding Custom Datasets

Create dataset class in src/experiment_design/datasets/<dataset_name>.py:

from .base import BaseDataset

class MyDataset(BaseDataset):
    def __init__(self, root):
        super().__init__(root)
        # Your dataset initialization

Configuration Files

Create new model configurations in config/:

model:
  name: my_custom_model
  split_layer: 5
  batch_size: 32

dataset:
  module: my_dataset
  class: MyDataset
  args:
    root: data/my_dataset

Troubleshooting

Connection Issues

SSH Key Problems:
- Verify key permissions: ls -l config/pkeys/*.rsa
- Test manual SSH: ssh -i config/pkeys/server_key.rsa user@host
- Check SSH service: sudo systemctl status ssh
Network Issues:
- Confirm devices are on same network
- Check firewall settings
- Verify ports are not blocked

Model Issues

Split Layer Problems:
- Ensure split_layer is less than total layers
- Verify layer compatibility
- Check memory requirements
Dataset Issues:
- Confirm correct paths in config
- Verify dataset format
- Check file permissions

Performance Issues

Resource Usage:
- Monitor GPU memory: nvidia-smi
- Check CPU usage: top
- Verify network bandwidth
Optimization Tips:
- Adjust batch size
- Modify worker count
- Consider split point optimization

License

This project is licensed under the MIT License - see LICENSE for details.

Acknowledgements

Font Attribution

This project uses the DejaVu Sans font for detection and classification overlays. DejaVu fonts are based on Bitstream Vera Fonts and are licensed under a permissive free license.

Citation

@software{tracr2024,
  author = {Nick Bovee, Izhar Ali, Suraj Bitla, Gopi Patapanchala, Shen-Shyang Ho},
  title = {SplitTracr: An Experimental Test-bed for Cooperative Inference using Split Computing},
  year = {2024},
  url = {https://github.com/nbovee/tracr}
}

Name		Name	Last commit message	Last commit date
Latest commit History 354 Commits
analysis		analysis
config		config
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
host.py		host.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py
uv.lock		uv.lock

License

nbovee/tracr

Folders and files

Latest commit

History

Repository files navigation

SplitTracr: An Experimental Test-bed for Cooperative Inference using Split Computing

Quick Start: Run AlexNet Split Inference

🔑 Required Prerequisites

1️⃣ Clone and Install

2️⃣ Configure Devices (REQUIRED)

3️⃣ Set Up SSH Keys (REQUIRED)

4️⃣ Prepare Data Directory (REQUIRED)

5️⃣ Run the Experiment

Table of Contents

Key Features

Install

Prerequisites

System Requirements

Software Installation

For Linux/Ubuntu:

For Windows:

Quick Start Guide

1. Basic Setup

Option A: Distributed Mode (Server-Host)

Option B: Local Mode

2. Pre-configured Experiments

Classification Models

Object Detection Models

Detailed Setup Guide

1. Project Structure

2. Device Configuration

A. SSH Key Setup

B. Key Installation

C. Device Configuration

Windows WSL Setup

1. WSL Network Configuration

2. Port Forwarding Setup

3. SSH Service

Extending tracr

Adding Custom Models

1. Using the Model Registry Decorator

2. Adding Custom Post-Processing

3. Adding Pre-trained Model Support

4. Configuration File

5. Using External Model Libraries

Adding Custom Datasets

Configuration Files

Troubleshooting

License

Acknowledgements

Font Attribution

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Extending `tracr`

Packages