-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error : Default process group is not initialized #23
Comments
Hi @rassabin |
Yeap, that helps. But it strange that we should to change norm_cfg parameter for each head seperatly as in backbone. |
New error case by trying to use mask version of binary crossentropy. ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-fec2661e1f4c> in <module>
16 mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
17 train_segmentor(model, datasets, cfg, distributed=False, validate=True,
---> 18 meta=dict())
~/mmsegmentation/mmseg/apis/train.py in train_segmentor(model, dataset, cfg, distributed, validate, timestamp, meta)
104 elif cfg.load_from:
105 runner.load_checkpoint(cfg.load_from)
--> 106 runner.run(data_loaders, cfg.workflow, cfg.total_iters)
~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py in run(self, data_loaders, workflow, max_iters, **kwargs)
117 if mode == 'train' and self.iter >= max_iters:
118 return
--> 119 iter_runner(iter_loaders[i], **kwargs)
120
121 time.sleep(1) # wait for some hooks like loggers to finish
~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py in train(self, data_loader, **kwargs)
53 self.call_hook('before_train_iter')
54 data_batch = next(data_loader)
---> 55 outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
56 if not isinstance(outputs, dict):
57 raise TypeError('model.train_step() must return a dict')
~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py in train_step(self, *inputs, **kwargs)
29
30 inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
---> 31 return self.module.train_step(*inputs[0], **kwargs[0])
32
33 def val_step(self, *inputs, **kwargs):
~/mmsegmentation/mmseg/models/segmentors/base.py in train_step(self, data_batch, optimizer, **kwargs)
148 """
149 data_batch['gt_semantic_seg'] = data_batch['gt_semantic_seg'][:,0,:].permute(0, 3, 1, 2)
--> 150 losses = self.forward_train(**data_batch, **kwargs)
151 loss, log_vars = self._parse_losses(losses)
152
~/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in forward_train(self, img, img_metas, gt_semantic_seg)
155
156 loss_decode = self._decode_head_forward_train(x, img_metas,
--> 157 gt_semantic_seg)
158 losses.update(loss_decode)
159
~/mmsegmentation/mmseg/models/segmentors/cascade_encoder_decoder.py in _decode_head_forward_train(self, x, img_metas, gt_semantic_seg)
84
85 loss_decode = self.decode_head[0].forward_train(
---> 86 x, img_metas, gt_semantic_seg, self.train_cfg)
87
88 losses.update(add_prefix(loss_decode, 'decode_0'))
~/mmsegmentation/mmseg/models/decode_heads/decode_head.py in forward_train(self, inputs, img_metas, gt_semantic_seg, train_cfg)
181 """
182 seg_logits = self.forward(inputs)
--> 183 losses = self.losses(seg_logits, gt_semantic_seg)
184 return losses
185
~/mmsegmentation/mmseg/models/decode_heads/decode_head.py in losses(self, seg_logit, seg_label)
225 seg_label,
226 weight=seg_weight,
--> 227 ignore_index=self.ignore_index)
228 loss['acc_seg'] = accuracy(seg_logit, seg_label)
229 return loss
~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
~/mmsegmentation/mmseg/models/losses/cross_entropy_loss.py in forward(self, cls_score, label, weight, avg_factor, reduction_override, **kwargs)
175 class_weight=class_weight,
176 reduction=reduction,
--> 177 avg_factor=avg_factor)
178 return loss_cls
~/mmsegmentation/mmseg/models/losses/cross_entropy_loss.py in mask_cross_entropy(pred, target, label, reduction, avg_factor, class_weight)
114 pred_slice = pred[inds, label].squeeze(1)
115 return F.binary_cross_entropy_with_logits(
--> 116 pred_slice, target, weight=class_weight, reduction='mean')[None]
117
118
~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
2122
2123 if not (target.size() == input.size()):
-> 2124 raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
2125
2126 return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
ValueError: Target size (torch.Size([3, 3, 512, 384])) must be the same as input size (torch.Size([3, 8, 512, 384])) I understand that the reason is different number of channels in output of the model and input annotations. But i cannot find the way to load 8 channel mask. I have the script which convert RGB representation in 8 class dimenasion mask, but where i should put it? In default class CustomDataset(Dataset) input only the path to mask files and reading it occurs in "build_from_cfg" module of mmcv. Thanks. Any suggestions |
Hi @rassabin
|
|
Hi @rassabin # RGB color list of length 8
color_list = [[0, 255, 0], ..., [255, 0, 255]]
palette = np.array(color_list)
# create a new image
image = Image.open(img_path)
# convert to `P` mode
new_image = image.quantize(palette=palette)
new_image.save(new_img_path) |
It's not a problem to create new representation of image, the problem is that NN have 8 channels of output and in mask binary crossentropy it's compared (8 , H , W) output with (3, H, W) label. |
Hi @rassabin |
Ok, i got it, thank you. Btw on such moment we the mask binary crossentropy loss cannot by applyiable ? |
Hi @rassabin |
Can you pls specify how to solve this problem? Thanks in advance |
change "SyncBN" to "BN" in "configs/base" |
For single GPU, we removed this error by changing "SyncBN" to "BN" |
Torch : 1.4.0
CUDA: 10.0
MMCV : 1.0.2
MMSEG: 0.5.0+1c3f547
small custom dataset
Config :
norm_cfg = dict(type='BN', requires_grad=True)
TRAIN MODEL :
#FULL error description:
The text was updated successfully, but these errors were encountered: