Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can only train with square dimensions #111

Open
fhandke-fugro opened this issue Dec 12, 2024 · 2 comments
Open

Can only train with square dimensions #111

fhandke-fugro opened this issue Dec 12, 2024 · 2 comments

Comments

@fhandke-fugro
Copy link

Hello,

Thanks for the repo. I achieved some initial good results!

I changed the input sizes successfuly from 640, 640 to 1280, 1280 and was able to train (also e.g. 800, 800).

So far so good. However, when I try to train on say 640, 1280 (h, w) I get this error:

Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/train.py", line 84, in <module>
    main(args)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/train.py", line 54, in main
    solver.fit()
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/solver/det_solver.py", line 27, in fit
    n_parameters, model_stats = stats(self.cfg)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/misc/profiler_utils.py", line 18, in stats
    flops, macs, _ = calculate_flops(model=model_for_info,
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/calflops/flops_counter.py", line 165, in calculate_flops
    _ = model(*args)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/dfine.py", line 28, in forward
    x = self.encoder(x)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 424, in forward
    memory :torch.Tensor = self.encoder[i](src_flatten, pos_embed=pos_embed)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 293, in forward
    output = layer(output, src_mask=src_mask, pos_embed=pos_embed)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 266, in forward
    q = k = self.with_pos_embed(src, pos_embed)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 260, in with_pos_embed
    return tensor if pos_embed is None else tensor + pos_embed
RuntimeError: The size of tensor a (400) must match the size of tensor b (800) at non-singleton dimension 1

I'm wondering what I would have to do to train on non-square images?

When attempting to train on 640, 1280, I changed following config files:
dfine/include/dataloader.yml:

train_dataloader:
  dataset:
    transforms:
      ops:
        - {type: RandomPhotometricDistort, p: 0.5}
        - {type: RandomZoomOut, fill: 0}
        - {type: RandomIoUCrop, p: 0.8}
        - {type: SanitizeBoundingBoxes, min_size: 1}
        - {type: RandomHorizontalFlip}
        - {type: Resize, size: [640, 1280], }
        - {type: SanitizeBoundingBoxes, min_size: 1}
        - {type: ConvertPILImage, dtype: 'float32', scale: True}
        - {type: ConvertBoxes, fmt: 'cxcywh', normalize: True}
      policy:
        name: stop_epoch
        epoch: 24 # epoch in [71, ~) stop `ops`
        ops: ['RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']

  collate_fn:
    type: BatchImageCollateFunction
    base_size: 640
    base_size_repeat: 3
    stop_epoch: 24 # epoch in [72, ~) stop `multiscales`

  shuffle: True
  total_batch_size: 28 # total batch size equals to 32 (4 * 8)
  num_workers: 4


val_dataloader:
  dataset:
    transforms:
      ops:
        - {type: Resize, size: [640, 1280], }
        - {type: ConvertPILImage, dtype: 'float32', scale: True}
  shuffle: False
  total_batch_size: 56
  num_workers: 4

and

dfine/include/dfine_hgnetv2.yml:

task: detection

model: DFINE
criterion: DFINECriterion
postprocessor: DFINEPostProcessor

use_focal_loss: True
eval_spatial_size: [640, 1280] # h w

DFINE:
  backbone: HGNetv2
  encoder: HybridEncoder
  decoder: DFINETransformer

HGNetv2:
  pretrained: True
  local_model_dir: weight/hgnetv2/

HybridEncoder:
  in_channels: [512, 1024, 2048]
  feat_strides: [8, 16, 32]

  # intra
  hidden_dim: 256
  use_encoder_idx: [2]
  num_encoder_layers: 1
  nhead: 8
  dim_feedforward: 1024
  dropout: 0.
  enc_act: 'gelu'

  # cross
  expansion: 1.0
  depth_mult: 1
  act: 'silu'


DFINETransformer:
  feat_channels: [256, 256, 256]
  feat_strides: [8, 16, 32]
  hidden_dim: 256
  num_levels: 3

  num_layers: 6
  eval_idx: -1
  num_queries: 300

  num_denoising: 100
  label_noise_ratio: 0.5
  box_noise_scale: 1.0

  # NEW
  reg_max: 32
  reg_scale: 4

  # Auxiliary decoder layers dimension scaling
  # "eg. If num_layers: 6 eval_idx: -4,
  # then layer 3, 4, 5 are auxiliary decoder layers."
  layer_scale: 1  # 2


  num_points: [3, 6, 3] # [4, 4, 4] [3, 6, 3]
  cross_attn_method: default # default, discrete
  query_select_method: default # default, agnostic


DFINEPostProcessor:
  num_top_queries: 300


DFINECriterion:
  weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2, loss_fgl: 0.15, loss_ddf: 1.5}
  losses: ['vfl', 'boxes', 'local']
  alpha: 0.75
  gamma: 2.0
  reg_max: 32

  matcher:
    type: HungarianMatcher
    weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
    alpha: 0.25
    gamma: 2.0

I use the config configs/dfine/custom/dfine_hgnetv2_l_custom.yml to train.

Every help is appreciated!

@SebastianJanampa
Copy link

Could you print the shape of the input image?
From your error:

RuntimeError: The size of tensor a (400) must match the size of tensor b (800) at non-singleton dimension 1

It looks like the image is resized to (1x3x640x640) instead of (1x3x640x1280)

@iangiu
Copy link

iangiu commented Dec 13, 2024

Hello,

Thanks for the repo. I achieved some initial good results!

I changed the input sizes successfuly from 640, 640 to 1280, 1280 and was able to train (also e.g. 800, 800).

So far so good. However, when I try to train on say 640, 1280 (h, w) I get this error:

Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/train.py", line 84, in <module>
    main(args)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/train.py", line 54, in main
    solver.fit()
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/solver/det_solver.py", line 27, in fit
    n_parameters, model_stats = stats(self.cfg)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/misc/profiler_utils.py", line 18, in stats
    flops, macs, _ = calculate_flops(model=model_for_info,
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/calflops/flops_counter.py", line 165, in calculate_flops
    _ = model(*args)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/dfine.py", line 28, in forward
    x = self.encoder(x)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 424, in forward
    memory :torch.Tensor = self.encoder[i](src_flatten, pos_embed=pos_embed)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 293, in forward
    output = layer(output, src_mask=src_mask, pos_embed=pos_embed)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 266, in forward
    q = k = self.with_pos_embed(src, pos_embed)
  File "/home/ec2-user/SageMaker/MagicTime/Training/DFINE/src/zoo/dfine/hybrid_encoder.py", line 260, in with_pos_embed
    return tensor if pos_embed is None else tensor + pos_embed
RuntimeError: The size of tensor a (400) must match the size of tensor b (800) at non-singleton dimension 1

I'm wondering what I would have to do to train on non-square images?

When attempting to train on 640, 1280, I changed following config files: dfine/include/dataloader.yml:

train_dataloader:
  dataset:
    transforms:
      ops:
        - {type: RandomPhotometricDistort, p: 0.5}
        - {type: RandomZoomOut, fill: 0}
        - {type: RandomIoUCrop, p: 0.8}
        - {type: SanitizeBoundingBoxes, min_size: 1}
        - {type: RandomHorizontalFlip}
        - {type: Resize, size: [640, 1280], }
        - {type: SanitizeBoundingBoxes, min_size: 1}
        - {type: ConvertPILImage, dtype: 'float32', scale: True}
        - {type: ConvertBoxes, fmt: 'cxcywh', normalize: True}
      policy:
        name: stop_epoch
        epoch: 24 # epoch in [71, ~) stop `ops`
        ops: ['RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']

  collate_fn:
    type: BatchImageCollateFunction
    base_size: 640
    base_size_repeat: 3
    stop_epoch: 24 # epoch in [72, ~) stop `multiscales`

  shuffle: True
  total_batch_size: 28 # total batch size equals to 32 (4 * 8)
  num_workers: 4


val_dataloader:
  dataset:
    transforms:
      ops:
        - {type: Resize, size: [640, 1280], }
        - {type: ConvertPILImage, dtype: 'float32', scale: True}
  shuffle: False
  total_batch_size: 56
  num_workers: 4

and

dfine/include/dfine_hgnetv2.yml:

task: detection

model: DFINE
criterion: DFINECriterion
postprocessor: DFINEPostProcessor

use_focal_loss: True
eval_spatial_size: [640, 1280] # h w

DFINE:
  backbone: HGNetv2
  encoder: HybridEncoder
  decoder: DFINETransformer

HGNetv2:
  pretrained: True
  local_model_dir: weight/hgnetv2/

HybridEncoder:
  in_channels: [512, 1024, 2048]
  feat_strides: [8, 16, 32]

  # intra
  hidden_dim: 256
  use_encoder_idx: [2]
  num_encoder_layers: 1
  nhead: 8
  dim_feedforward: 1024
  dropout: 0.
  enc_act: 'gelu'

  # cross
  expansion: 1.0
  depth_mult: 1
  act: 'silu'


DFINETransformer:
  feat_channels: [256, 256, 256]
  feat_strides: [8, 16, 32]
  hidden_dim: 256
  num_levels: 3

  num_layers: 6
  eval_idx: -1
  num_queries: 300

  num_denoising: 100
  label_noise_ratio: 0.5
  box_noise_scale: 1.0

  # NEW
  reg_max: 32
  reg_scale: 4

  # Auxiliary decoder layers dimension scaling
  # "eg. If num_layers: 6 eval_idx: -4,
  # then layer 3, 4, 5 are auxiliary decoder layers."
  layer_scale: 1  # 2


  num_points: [3, 6, 3] # [4, 4, 4] [3, 6, 3]
  cross_attn_method: default # default, discrete
  query_select_method: default # default, agnostic


DFINEPostProcessor:
  num_top_queries: 300


DFINECriterion:
  weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2, loss_fgl: 0.15, loss_ddf: 1.5}
  losses: ['vfl', 'boxes', 'local']
  alpha: 0.75
  gamma: 2.0
  reg_max: 32

  matcher:
    type: HungarianMatcher
    weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
    alpha: 0.25
    gamma: 2.0

I use the config configs/dfine/custom/dfine_hgnetv2_l_custom.yml to train.

Every help is appreciated!

You can comment out the following code in the det_solver.py file.
line 27、28、145

   # n_parameters, model_stats = stats(self.cfg)
   # print(model_stats)
   # 'n_parameters': n_parameters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants