Can only train with square dimensions #111

fhandke-fugro · 2024-12-12T10:07:32Z

Hello,

Thanks for the repo. I achieved some initial good results!

I changed the input sizes successfuly from 640, 640 to 1280, 1280 and was able to train (also e.g. 800, 800).

So far so good. However, when I try to train on say 640, 1280 (h, w) I get this error:

Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/train.py", line 84, in <module>
    main(args)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/train.py", line 54, in main
    solver.fit()
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/solver/det_solver.py", line 27, in fit
    n_parameters, model_stats = stats(self.cfg)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/misc/profiler_utils.py", line 18, in stats
    flops, macs, _ = calculate_flops(model=model_for_info,
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/calflops/flops_counter.py", line 165, in calculate_flops
    _ = model(*args)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/dfine.py", line 28, in forward
    x = self.encoder(x)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 424, in forward
    memory :torch.Tensor = self.encoder[i](src_flatten, pos_embed=pos_embed)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 293, in forward
    output = layer(output, src_mask=src_mask, pos_embed=pos_embed)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 266, in forward
    q = k = self.with_pos_embed(src, pos_embed)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 260, in with_pos_embed
    return tensor if pos_embed is None else tensor + pos_embed
RuntimeError: The size of tensor a (400) must match the size of tensor b (800) at non-singleton dimension 1

I'm wondering what I would have to do to train on non-square images?

When attempting to train on 640, 1280, I changed following config files:
dfine/include/dataloader.yml:

train_dataloader:
  dataset:
    transforms:
      ops:
        - {type: RandomPhotometricDistort, p: 0.5}
        - {type: RandomZoomOut, fill: 0}
        - {type: RandomIoUCrop, p: 0.8}
        - {type: SanitizeBoundingBoxes, min_size: 1}
        - {type: RandomHorizontalFlip}
        - {type: Resize, size: [640, 1280], }
        - {type: SanitizeBoundingBoxes, min_size: 1}
        - {type: ConvertPILImage, dtype: 'float32', scale: True}
        - {type: ConvertBoxes, fmt: 'cxcywh', normalize: True}
      policy:
        name: stop_epoch
        epoch: 24 # epoch in [71, ~) stop `ops`
        ops: ['RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']

  collate_fn:
    type: BatchImageCollateFunction
    base_size: 640
    base_size_repeat: 3
    stop_epoch: 24 # epoch in [72, ~) stop `multiscales`

  shuffle: True
  total_batch_size: 28 # total batch size equals to 32 (4 * 8)
  num_workers: 4


val_dataloader:
  dataset:
    transforms:
      ops:
        - {type: Resize, size: [640, 1280], }
        - {type: ConvertPILImage, dtype: 'float32', scale: True}
  shuffle: False
  total_batch_size: 56
  num_workers: 4

and

dfine/include/dfine_hgnetv2.yml:

task: detection

model: DFINE
criterion: DFINECriterion
postprocessor: DFINEPostProcessor

use_focal_loss: True
eval_spatial_size: [640, 1280] # h w

DFINE:
  backbone: HGNetv2
  encoder: HybridEncoder
  decoder: DFINETransformer

HGNetv2:
  pretrained: True
  local_model_dir: weight/hgnetv2/

HybridEncoder:
  in_channels: [512, 1024, 2048]
  feat_strides: [8, 16, 32]

  # intra
  hidden_dim: 256
  use_encoder_idx: [2]
  num_encoder_layers: 1
  nhead: 8
  dim_feedforward: 1024
  dropout: 0.
  enc_act: 'gelu'

  # cross
  expansion: 1.0
  depth_mult: 1
  act: 'silu'


DFINETransformer:
  feat_channels: [256, 256, 256]
  feat_strides: [8, 16, 32]
  hidden_dim: 256
  num_levels: 3

  num_layers: 6
  eval_idx: -1
  num_queries: 300

  num_denoising: 100
  label_noise_ratio: 0.5
  box_noise_scale: 1.0

  # NEW
  reg_max: 32
  reg_scale: 4

  # Auxiliary decoder layers dimension scaling
  # "eg. If num_layers: 6 eval_idx: -4,
  # then layer 3, 4, 5 are auxiliary decoder layers."
  layer_scale: 1  # 2


  num_points: [3, 6, 3] # [4, 4, 4] [3, 6, 3]
  cross_attn_method: default # default, discrete
  query_select_method: default # default, agnostic


DFINEPostProcessor:
  num_top_queries: 300


DFINECriterion:
  weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2, loss_fgl: 0.15, loss_ddf: 1.5}
  losses: ['vfl', 'boxes', 'local']
  alpha: 0.75
  gamma: 2.0
  reg_max: 32

  matcher:
    type: HungarianMatcher
    weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
    alpha: 0.25
    gamma: 2.0

I use the config configs/dfine/custom/dfine_hgnetv2_l_custom.yml to train.

Every help is appreciated!

The text was updated successfully, but these errors were encountered:

SebastianJanampa · 2024-12-12T21:23:55Z

Could you print the shape of the input image?
From your error:

RuntimeError: The size of tensor a (400) must match the size of tensor b (800) at non-singleton dimension 1

It looks like the image is resized to (1x3x640x640) instead of (1x3x640x1280)

iangiu · 2024-12-13T03:50:23Z

Hello,

Thanks for the repo. I achieved some initial good results!

I changed the input sizes successfuly from 640, 640 to 1280, 1280 and was able to train (also e.g. 800, 800).

So far so good. However, when I try to train on say 640, 1280 (h, w) I get this error:

Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/train.py", line 84, in <module>
    main(args)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/train.py", line 54, in main
    solver.fit()
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/solver/det_solver.py", line 27, in fit
    n_parameters, model_stats = stats(self.cfg)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/misc/profiler_utils.py", line 18, in stats
    flops, macs, _ = calculate_flops(model=model_for_info,
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/calflops/flops_counter.py", line 165, in calculate_flops
    _ = model(*args)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/dfine.py", line 28, in forward
    x = self.encoder(x)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 424, in forward
    memory :torch.Tensor = self.encoder[i](src_flatten, pos_embed=pos_embed)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 293, in forward
    output = layer(output, src_mask=src_mask, pos_embed=pos_embed)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 266, in forward
    q = k = self.with_pos_embed(src, pos_embed)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 260, in with_pos_embed
    return tensor if pos_embed is None else tensor + pos_embed
RuntimeError: The size of tensor a (400) must match the size of tensor b (800) at non-singleton dimension 1

I'm wondering what I would have to do to train on non-square images?

When attempting to train on 640, 1280, I changed following config files: dfine/include/dataloader.yml:

train_dataloader:
  dataset:
    transforms:
      ops:
        - {type: RandomPhotometricDistort, p: 0.5}
        - {type: RandomZoomOut, fill: 0}
        - {type: RandomIoUCrop, p: 0.8}
        - {type: SanitizeBoundingBoxes, min_size: 1}
        - {type: RandomHorizontalFlip}
        - {type: Resize, size: [640, 1280], }
        - {type: SanitizeBoundingBoxes, min_size: 1}
        - {type: ConvertPILImage, dtype: 'float32', scale: True}
        - {type: ConvertBoxes, fmt: 'cxcywh', normalize: True}
      policy:
        name: stop_epoch
        epoch: 24 # epoch in [71, ~) stop `ops`
        ops: ['RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']

  collate_fn:
    type: BatchImageCollateFunction
    base_size: 640
    base_size_repeat: 3
    stop_epoch: 24 # epoch in [72, ~) stop `multiscales`

  shuffle: True
  total_batch_size: 28 # total batch size equals to 32 (4 * 8)
  num_workers: 4


val_dataloader:
  dataset:
    transforms:
      ops:
        - {type: Resize, size: [640, 1280], }
        - {type: ConvertPILImage, dtype: 'float32', scale: True}
  shuffle: False
  total_batch_size: 56
  num_workers: 4

and

dfine/include/dfine_hgnetv2.yml:

task: detection

model: DFINE
criterion: DFINECriterion
postprocessor: DFINEPostProcessor

use_focal_loss: True
eval_spatial_size: [640, 1280] # h w

DFINE:
  backbone: HGNetv2
  encoder: HybridEncoder
  decoder: DFINETransformer

HGNetv2:
  pretrained: True
  local_model_dir: weight/hgnetv2/

HybridEncoder:
  in_channels: [512, 1024, 2048]
  feat_strides: [8, 16, 32]

  # intra
  hidden_dim: 256
  use_encoder_idx: [2]
  num_encoder_layers: 1
  nhead: 8
  dim_feedforward: 1024
  dropout: 0.
  enc_act: 'gelu'

  # cross
  expansion: 1.0
  depth_mult: 1
  act: 'silu'


DFINETransformer:
  feat_channels: [256, 256, 256]
  feat_strides: [8, 16, 32]
  hidden_dim: 256
  num_levels: 3

  num_layers: 6
  eval_idx: -1
  num_queries: 300

  num_denoising: 100
  label_noise_ratio: 0.5
  box_noise_scale: 1.0

  # NEW
  reg_max: 32
  reg_scale: 4

  # Auxiliary decoder layers dimension scaling
  # "eg. If num_layers: 6 eval_idx: -4,
  # then layer 3, 4, 5 are auxiliary decoder layers."
  layer_scale: 1  # 2


  num_points: [3, 6, 3] # [4, 4, 4] [3, 6, 3]
  cross_attn_method: default # default, discrete
  query_select_method: default # default, agnostic


DFINEPostProcessor:
  num_top_queries: 300


DFINECriterion:
  weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2, loss_fgl: 0.15, loss_ddf: 1.5}
  losses: ['vfl', 'boxes', 'local']
  alpha: 0.75
  gamma: 2.0
  reg_max: 32

  matcher:
    type: HungarianMatcher
    weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
    alpha: 0.25
    gamma: 2.0

I use the config configs/dfine/custom/dfine_hgnetv2_l_custom.yml to train.

Every help is appreciated!

You can comment out the following code in the det_solver.py file.
line 27、28、145

   # n_parameters, model_stats = stats(self.cfg)
   # print(model_stats)
   # 'n_parameters': n_parameters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can only train with square dimensions #111

Can only train with square dimensions #111

fhandke-fugro commented Dec 12, 2024

SebastianJanampa commented Dec 12, 2024

iangiu commented Dec 13, 2024

Can only train with square dimensions #111

Can only train with square dimensions #111

Comments

fhandke-fugro commented Dec 12, 2024

SebastianJanampa commented Dec 12, 2024

iangiu commented Dec 13, 2024