Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when I try to resume training with instance segmentation in google colab #403

Closed
1 task done
alanacc92 opened this issue Sep 21, 2023 · 5 comments
Closed
1 task done
Labels
bug Something isn't working Stale

Comments

@alanacc92
Copy link

Search before asking

  • I have searched the HUB issues and found no similar bug report.

HUB Component

Training

Bug

Ultralytics HUB: New authentication successful ✅
Ultralytics HUB: View model at https://hub.ultralytics.com/models/2TaEEEn6ncyUFLxCdWJ6 🚀
Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/eFUHfcU7kgPqOknxAqN6z2bDjBw2/models/2TaEEEn6ncyUFLxCdWJ6/epoch-31.pt to 'epoch-31.pt'...
100%|██████████| 90.5M/90.5M [00:04<00:00, 20.6MB/s]
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify', or 'pose'.
Ultralytics YOLOv8.0.183 🚀 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB)
engine/trainer: task=segment, mode=train, model=epoch-31.pt, data=https://storage.googleapis.com/ultralytics-hub.appspot.com/users/eFUHfcU7kgPqOknxAqN6z2bDjBw2/datasets/MV6QbQSZVe2iAAtRpDFD/Cars Test2.v3i.yolov8+severe ML.zip, epochs=100, patience=15, batch=17, imgsz=640, save=True, save_period=-1, cache=ram, device=, workers=8, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, stream_buffer=False, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.0, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs/segment/train
Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/eFUHfcU7kgPqOknxAqN6z2bDjBw2/datasets/MV6QbQSZVe2iAAtRpDFD/Cars Test2.v3i.yolov8+severe ML.zip to 'Cars Test2.v3i.yolov8+severe ML.zip'...
100%|██████████| 773M/773M [00:36<00:00, 22.1MB/s]
Unzipping Cars Test2.v3i.yolov8+severe ML.zip to /content/datasets/Cars Test2.v3i.yolov8+severe ML...: 100%|██████████| 47345/47345 [00:12<00:00, 3751.65file/s]
Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'...
100%|██████████| 755k/755k [00:00<00:00, 18.1MB/s]
TensorBoard: Start with 'tensorboard --logdir runs/segment/train', view at http://localhost:6006/

               from  n    params  module                                       arguments                     

0 -1 1 928 ultralytics.nn.modules.conv.Conv [3, 32, 3, 2]
1 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
2 -1 1 29056 ultralytics.nn.modules.block.C2f [64, 64, 1, True]
3 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]
4 -1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True]
5 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2]
6 -1 2 788480 ultralytics.nn.modules.block.C2f [256, 256, 2, True]
7 -1 1 1180672 ultralytics.nn.modules.conv.Conv [256, 512, 3, 2]
8 -1 1 1838080 ultralytics.nn.modules.block.C2f [512, 512, 1, True]
9 -1 1 656896 ultralytics.nn.modules.block.SPPF [512, 512, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 1 591360 ultralytics.nn.modules.block.C2f [768, 256, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 1 148224 ultralytics.nn.modules.block.C2f [384, 128, 1]
16 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 1 493056 ultralytics.nn.modules.block.C2f [384, 256, 1]
19 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 1 1969152 ultralytics.nn.modules.block.C2f [768, 512, 1]
22 [15, 18, 21] 1 2771705 ultralytics.nn.modules.head.Segment [3, 32, 128, [128, 256, 512]]
YOLOv8s-seg summary: 261 layers, 11791257 parameters, 11791241 gradients

Transferred 417/417 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to 'yolov8n.pt'...
100%|██████████| 6.23M/6.23M [00:00<00:00, 76.3MB/s]
AMP: checks passed ✅
train: Scanning /content/datasets/Cars Test2.v3i.yolov8+severe ML/train/labels... 21118 images, 214 backgrounds, 0 corrupt: 100%|██████████| 21118/21118 [00:14<00:00, 1443.36it/s]
train: New cache created: /content/datasets/Cars Test2.v3i.yolov8+severe ML/train/labels.cache
train: 27.5GB RAM required to cache images with 50% safety margin but only 8.2/12.7GB available, not caching images ⚠️
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
val: Scanning /content/datasets/Cars Test2.v3i.yolov8+severe ML/valid/labels... 2502 images, 21 backgrounds, 0 corrupt: 100%|██████████| 2502/2502 [00:03<00:00, 819.01it/s]
val: New cache created: /content/datasets/Cars Test2.v3i.yolov8+severe ML/valid/labels.cache
val: Caching images (2.2GB ram): 100%|██████████| 2502/2502 [00:11<00:00, 227.16it/s]
Plotting labels to runs/segment/train/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 66 weight(decay=0.0), 77 weight(decay=0.00053125), 76 bias(decay=0.0)
Resuming training from epoch-31.pt from epoch 33 to 100 total epochs
Ultralytics HUB: View model at https://hub.ultralytics.com/models/2TaEEEn6ncyUFLxCdWJ6 🚀
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs/segment/train
Starting training for 100 epochs...

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size

0%| | 0/1243 [00:00<?, ?it/s]


RuntimeError Traceback (most recent call last)

in <cell line: 5>()
3 model = YOLO('https://hub.ultralytics.com/models/2TaEEEn6ncyUFLxCdWJ6')
4
----> 5 model.train()

7 frames

/usr/local/lib/python3.10/dist-packages/ultralytics/utils/loss.py in (.0)
159 loss = torch.zeros(3, device=self.device) # box, cls, dfl
160 feats = preds[1] if isinstance(preds, tuple) else preds
--> 161 pred_distri, pred_scores = torch.cat([xi.view(feats[0].shape[0], self.no, -1) for xi in feats], 2).split(
162 (self.reg_max * 4, self.nc), 1)
163

RuntimeError: shape '[32, 67, -1]' is invalid for input of size 268800

Environment

Ultralytics HUB Version
v0.1.24
Client User Agent
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/117.0
Operating System
Win32
Browser Window Size
1920 x 927
Server Timestamp
1695274659

Minimal Reproducible Example

Setup in google colab

%pip install ultralytics # install
from ultralytics import YOLO, checks, hub
checks() # checks

Login:

hub.login('xxxxxx')

model = YOLO('https://hub.ultralytics.com/models/2TaEEEn6ncyUFLxCdWJ6')

model.train()

Additional

No response

@alanacc92 alanacc92 added the bug Something isn't working label Sep 21, 2023
@github-actions
Copy link

👋 Hello @alanacc92, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

  • Quickstart. Start training and deploying YOLO models with HUB in seconds.
  • Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
  • Projects: Creating and Managing. Group your models into projects for improved organization.
  • Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
  • Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
  • Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
    • iOS. Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads.
    • Android. Explore TFLite acceleration on mobile devices.
  • Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

@github-actions
Copy link

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label Oct 22, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2023
@UltralyticsAssistant
Copy link
Member

@alanacc92 hello! Thank you for providing detailed information about the error you're encountering while trying to resume training with instance segmentation using Google Colab.

From the log you've posted, it seems like the input shape for one of the loss computations is invalid. The RuntimeError: shape '[32, 67, -1]' is invalid for input of size 268800 indicates that there might be a mismatch in the number of classes or an issue with the format of the data being passed to the model. This kind of error can sometimes happen if there was a recent update or change in the model or training data that is incompatible with the existing cached weights file.

Here are some steps that might resolve your issue:

  1. Clear any existing caches that might be causing the issue.
  2. Ensure that the number of classes in the dataset matches the number used when the model ckpt was created.
  3. Double-check the dataset and annotation files for correctness.
  4. Verify if there were any changes to the model architecture or input pipeline that need to be reconciled.
  5. Consider training a fresh model if the dataset or the task definition has significantly changed.

If the issue persists, please provide the full stack trace or any additional details on the GitHub issue tracker. This will help in diagnosing the root cause more effectively.

For further help with troubleshooting, you might want to refer to the documentation available at https://docs.ultralytics.com/hub, which covers common issues and best practices for using the Ultralytics HUB.

Also, remember that you can always reach out to the broader community and the Ultralytics team on the GitHub repository for assistance. They can provide valuable insights and suggestions based on their experience.

@wenruoxu
Copy link

I also encountered this question. I finally found I use a false trainner. The task I was doing is detection but I used a segmentation trainer. Then, I used the right one and this bug is fixed. It is a phenomenon of the lack of invalid data checking. Maybe your problem was not caused by this, however, check your config and yaml file is a clever choice.

@pderrenger
Copy link
Member

Hello @wenruoxu,

Thank you for sharing your experience and insights! It's great to hear that you were able to resolve the issue by ensuring you used the correct trainer for your task. Indeed, using the appropriate configuration and YAML files is crucial for successful training.

For anyone encountering similar issues, here are a few additional steps to consider:

  1. Verify Task and Trainer Alignment: Ensure that the task (e.g., detection, segmentation) matches the trainer you are using. Misalignment can lead to unexpected errors.

  2. Check Configuration Files: Double-check your configuration and YAML files to ensure they are correctly set up for your specific task. This includes verifying the number of classes, input shapes, and other parameters.

  3. Update to Latest Versions: Make sure you are using the latest versions of the Ultralytics packages. Sometimes, bugs are fixed in newer releases, and updating can resolve issues.

  4. Clear Caches: If you have made changes to your dataset or configuration, clearing any existing caches can help avoid conflicts.

Here is a small code snippet to help you ensure that you are using the correct task and trainer:

from ultralytics import YOLO

# Load the model
model = YOLO('path/to/your/model.pt')

# Ensure the correct task is set
model.task = 'detect'  # or 'segment', 'classify', etc.

# Train the model
model.train(data='path/to/your/data.yaml', epochs=100)

If you continue to experience issues, please provide more details on the GitHub issue tracker, and the community or Ultralytics team will be happy to assist you further.

Thank you for your contribution to the discussion, and happy training! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

4 participants