Detectron2 multigpu and dataset splitting #4622

unrue · 2022-10-27T08:40:37Z

unrue
Oct 27, 2022

I'm using Detectron2 in order to train a custom dataset of about 4000 images. I turned off augmentation, and I'm experiencing some out of memory on GPUs. My cluster have nodes with 4 Nvidia Tesla V100 16 Gb for gpu.

First of all, I disabled augmentation:


    @classmethod
    def build_train_loader(cls, cfg):
        mapper = DatasetMapper(cfg, is_train=True, augmentations=[])
        return build_detection_train_loader(cfg, mapper=mapper)

Some net configuration:

  cfg = get_cfg()
  cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml"))
  cfg.DATASETS.TRAIN = ("my_train",)
  cfg.DATASETS.TEST = ("my_val",)

  cfg.DATALOADER.NUM_WORKERS = 4
  cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml")  # Let training initialize from model zoo
  cfg.SOLVER.IMS_PER_BATCH = 4
  cfg.SOLVER.BASE_LR = 0.001

  cfg.SOLVER.WARMUP_ITERS = 1000
  cfg.SOLVER.MAX_ITER = 3000 
  cfg.SOLVER.GAMMA = 0.05

  cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 8
  cfg.TEST.EVAL_PERIOD = 500

  cfg.MODEL.ROI_HEADS.NUM_CLASSES = 50

And launched with multigpus and multi node (example using 2 nodes and 4 gpu per node, so 8 gpus in total):

launch(main,
        num_machines = 2,
        num_gpus_per_machine = 4,
        machine_rank = node_rank,
        dist_url = 'tcp://' + master_ip + ':' + master_port)

With augmentation on it works well. The problems start when is off. Using 1 GPU I get:

RuntimeError: CUDA out of memory. Tried to allocate 154.00 MiB (GPU 0; 15.78 GiB total capacity; 14.34 GiB already allocated; 88.88 MiB free; 14.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Using 8 GPus I Get:

RuntimeError: CUDA out of memory. Tried to allocate 240.00 MiB (GPU 1; 15.78 GiB total capacity; 14.10 GiB already allocated; 204.88 MiB free; 14.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Using 16 GPus I get:

RuntimeError: CUDA out of memory. Tried to allocate 348.00 MiB (GPU 2; 15.78 GiB total capacity; 13.36 GiB already allocated; 266.88 MiB free; 13.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I don't understand if Detectron split dataset among GPUs or simply replicate on each GPU. Why the amount of memory per GPU requested does not decrease with more GPus?

I tried also using:

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128

with no effect. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detectron2 multigpu and dataset splitting #4622

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Detectron2 multigpu and dataset splitting #4622

unrue Oct 27, 2022

Replies: 0 comments

unrue
Oct 27, 2022