Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you please help to implement the example code of yolo model training? #677

Closed
smileyee opened this issue Dec 11, 2024 · 1 comment

Comments

@smileyee
Copy link

No description provided.

@makseq
Copy link
Member

makseq commented Feb 12, 2025

Below is an example Python script that combines several steps into a training pipeline for a YOLO model using your Label Studio ML backend data. This example will:

  1. Connect to your Label Studio instance and export all tasks (with images) as a snapshot.
  2. Convert the export to a YOLO‐compatible format (using the Label Studio Converter).
  3. Download all associated images into a dedicated “images” directory.
  4. Prepare a YAML file (as required by Ultralytics) that points to the training (and validation) images and defines the class names (rectangle labels).
  5. Run a background job that calls the Ultralytics training command.

Before running the training, make sure you have ultralytics installed and your Label Studio instance and API key ready.

Below is a sample implementation:


import os
import time
import json
import shutil
import subprocess
import argparse
import logging
import yaml

from label_studio_sdk import Client
from label_studio_sdk.converter import Converter
from label_studio_sdk._extensions.label_studio_tools.core.utils.io import get_local_path

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")


def export_yolo_dataset(ls_url, api_key, project_id, output_dir):
    """
    Exports the tasks from Label Studio as a snapshot,
    converts them to the YOLO format and downloads associated images.
    """
    # Connect to Label Studio and get project.
    ls = Client(url=ls_url, api_key=api_key)
    ls.check_connection()
    project = ls.get_project(project_id)
    logging.info(f"Connected to project: {project_id}")

    # Create export snapshot.
    export_result = project.export_snapshot_create(title="YOLO Export Snapshot")
    export_id = export_result["id"]
    logging.info(f"Export snapshot created with ID: {export_id}")

    # Wait until snapshot is ready.
    snapshot_status = project.export_snapshot_status(export_id)
    while snapshot_status.is_in_progress():
        logging.info("Waiting for snapshot to be ready...")
        time.sleep(1)
        snapshot_status = project.export_snapshot_status(export_id)

    status, snapshot_path = project.export_snapshot_download(export_id, export_type="JSON")
    if status != 200:
        logging.error(f"Export snapshot failed with status: {status}")
        return None

    # Load exported tasks.
    with open(snapshot_path, "r") as f:
        exported_tasks = json.load(f)
    logging.info(f"Exported {len(exported_tasks)} tasks from Label Studio.")

    # Convert the export to YOLO format.
    label_config = project.params["label_config"]
    converter = Converter(config=label_config, project_dir=os.path.dirname(snapshot_path), download_resources=False)
    converter.convert_to_yolo(input_data=snapshot_path, output_dir=output_dir, is_dir=False)
    logging.info("Converted export to YOLO format.")

    # Download images for each task.
    yolo_images_dir = os.path.join(output_dir, "images")
    os.makedirs(yolo_images_dir, exist_ok=True)
    for task in exported_tasks:
        # Assumes each task data contains one image URL.
        image_url = list(task["data"].values())[0]
        if image_url:
            max_retries = 3
            for attempt in range(1, max_retries + 1):
                try:
                    local_image_path = get_local_path(
                        url=image_url,
                        hostname=ls_url,
                        access_token=api_key,
                        task_id=task["id"],
                        download_resources=True
                    )
                    dest_path = os.path.join(yolo_images_dir, os.path.basename(local_image_path))
                    shutil.copy2(local_image_path, dest_path)
                    logging.info(f"Downloaded image for task {task['id']}")
                    break  # success; exit retry loop
                except Exception as e:
                    logging.error(f"Error downloading image for task {task['id']} (attempt {attempt}): {e}")
                    time.sleep(2 ** attempt)
    logging.info("All images downloaded.")

    return output_dir


def prepare_yaml_file(images_dir, classes, output_yaml_path):
    """
    Create a YOLO training YAML file.
    
    YAML format example:
      train: /path/to/train/images
      val: /path/to/val/images
      nc: 2
      names: ['class1', 'class2']
    """
    yaml_data = {
        "train": images_dir,
        "val": images_dir,  # Using the same folder for simplicity
        "nc": len(classes),
        "names": classes
    }
    with open(output_yaml_path, "w") as f:
        yaml.dump(yaml_data, f)
    logging.info(f"YAML file saved to: {output_yaml_path}")
    return output_yaml_path


def run_ultralytics_training(yaml_file, epochs, batch_size, img_size, model):
    """
    Launch the Ultralytics training command.
    
    Example command:
      ultralytics train data=data.yaml model=yolov8s.pt --epochs 50 --batch-size 16 --imgsz 640
    """
    cmd = [
        "ultralytics", "train",
        "data", yaml_file,
        "model", model,
        "--epochs", str(epochs),
        "--batch-size", str(batch_size),
        "--imgsz", str(img_size)
    ]
    logging.info("Starting YOLO training with Ultralytics...")
    subprocess.run(cmd, check=True)  # Raises exception if training fails


def main():
    parser = argparse.ArgumentParser(
        description="YOLO Model Training for Label Studio ML Backend"
    )
    parser.add_argument("--ls-url", type=str, required=True, help="Label Studio URL")
    parser.add_argument("--api-key", type=str, required=True, help="Label Studio API Key")
    parser.add_argument("--project-id", type=int, required=True, help="Label Studio Project ID")
    parser.add_argument(
        "--output-dir", type=str, default="output_yolo", help="Directory for YOLO export dataset"
    )
    parser.add_argument(
        "--classes", type=str, nargs="+", required=True, help="List of bounding box class names (rectangle labels)"
    )
    parser.add_argument("--epochs", type=int, default=50, help="Number of training epochs")
    parser.add_argument("--batch-size", type=int, default=16, help="Training batch size")
    parser.add_argument("--img-size", type=int, default=640, help="Input image size")
    parser.add_argument("--model", type=str, default="yolov8s.pt", help="Pretrained YOLO model to fine-tune")
    args = parser.parse_args()

    # Step 1: Export dataset and convert to YOLO format
    dataset_dir = export_yolo_dataset(args.ls_url, args.api_key, args.project_id, args.output_dir)
    if not dataset_dir:
        logging.error("Dataset export failed. Exiting.")
        return

    # Step 2: Prepare the YAML configuration for YOLO training.
    images_dir = os.path.join(args.output_dir, "images")
    yaml_path = os.path.join(args.output_dir, "data.yaml")
    prepare_yaml_file(images_dir, args.classes, yaml_path)

    # Step 3: Run Ultralytics training.
    run_ultralytics_training(
        yaml_file=yaml_path,
        epochs=args.epochs,
        batch_size=args.batch_size,
        img_size=args.img_size,
        model=args.model
    )


if __name__ == "__main__":
    main()

How to Use This Script

  1. Install Dependencies:
    Make sure you have installed the Label Studio SDK and the Ultralytics package. You can install them via pip:

    pip install git+https://github.com/heartexlabs/label-studio-sdk.git
    pip install ultralytics pyyaml
  2. Configure Environment Variables or Pass as Arguments:
    Provide your Label Studio URL, API key, and project ID along with the list of class names (e.g., for rectangle labels).

  3. Run the Script:
    For example:

    python your_script.py \
        --ls-url "https://app.humansignal.com" \
        --api-key "your_api_key" \
        --project-id 12345 \
        --classes "Car" "Truck" "Bus" \
        --epochs 50 \
        --batch-size 16 \
        --img-size 640 \
        --model "yolov8s.pt"

This script exports your tasks as a snapshot from Label Studio, converts them to the YOLO format along with associated images, creates the required YAML file, and finally triggers the background training job via the Ultralytics training command.

Links:

@makseq makseq closed this as completed Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants