You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2023-04-17 03:23:03,050 - mmdet - INFO - workflow: [('train', 2)], max: 20 epochs
2023-04-17 03:23:04.516097: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-04-17 03:23:04.516235: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-04-17 03:23:04.516248: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator())
2023-04-17 03:25:20,186 - mmdet - INFO - Epoch [1][50/85] lr: 1.115e-04, eta: 1:14:05, time: 2.694, data_time: 0.135, memory: 12569, loss_heatmap: 215.9703, layer_-1_loss_cls: 4.6548, layer_-1_loss_bbox: 13.0959, matched_ious: 0.0027, loss: 233.7210, grad_norm: 1336.1663
2023-04-17 03:26:50,708 - mmdet - INFO - Saving checkpoint at 1 epochs
[ ] 0/81, elapsed: 0s, ETA:/root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.post_center_range, device=heatmap.device)
/root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.post_center_range, device=heatmap.device)
/root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.post_center_range, device=heatmap.device)
[>> ] 4/81, 1.8 task/s, elapsed: 2s, ETA: 42s/root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.post_center_range, device=heatmap.device)
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 84/81, 11.9 task/s, elapsed: 7s, ETA: 0s
Formating bboxes of pts_bbox
Start to convert detection format...
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 81/81, 34.6 task/s, elapsed: 2s, ETA: 0s
Results writes to /tmp/tmps7pn7cvi/results/pts_bbox/results_nusc.json
Evaluating bboxes of pts_bbox
aaaaaaaaaaaaaaaa mini_val /tmp/tmps7pn7cvi/results/pts_bbox
Traceback (most recent call last):
File "tools/train.py", line 253, in
main()
File "tools/train.py", line 249, in main
meta=meta)
File "/usr/local/lib/python3.6/dist-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
self.call_hook('after_train_epoch')
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/usr/local/lib/python3.6/dist-packages/mmdet/core/evaluation/eval_hooks.py", line 279, in after_train_epoch
key_score = self.evaluate(runner, results)
File "/usr/local/lib/python3.6/dist-packages/mmdet/core/evaluation/eval_hooks.py", line 177, in evaluate
results, logger=runner.logger, **self.eval_kwargs)
File "/root/work/TransFusion/mmdet3d/datasets/nuscenes_dataset.py", line 489, in evaluate
ret_dict = self._evaluate_single(result_files[name])
File "/root/work/TransFusion/mmdet3d/datasets/nuscenes_dataset.py", line 400, in _evaluate_single
verbose=False)
File "/usr/local/lib/python3.6/dist-packages/nuscenes/eval/detection/evaluate.py", line 94, in init
self.pred_boxes = filter_eval_boxes(nusc, self.pred_boxes, self.cfg.class_range, verbose=verbose)
File "/usr/local/lib/python3.6/dist-packages/nuscenes/eval/common/loaders.py", line 219, in filter_eval_boxes
class_field = _get_box_class_field(eval_boxes)
File "/usr/local/lib/python3.6/dist-packages/nuscenes/eval/common/loaders.py", line 283, in _get_box_class_field
raise Exception('Error: Invalid box type: %s' % box)
Exception: Error: Invalid box type: None
Killing subprocess 88
Killing subprocess 89
Killing subprocess 90
Killing subprocess 91
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 340, in
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', 'tools/train.py', '--local_rank=3', 'configs/transfusion_nusc_voxel_L.py', '--launcher', 'pytorch']' returned non-zero exit status 1.
The text was updated successfully, but these errors were encountered:
#environment
sys.platform: linux
Python: 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA A100-PCIE-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu111
PyTorch compiling details: PyTorch built with:
TorchVision: 0.9.1+cu111
OpenCV: 4.5.2
MMCV: 1.3.10
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.11.0
MMDetection3D: 0.11.0+
#This issue was occured.
2023-04-17 03:23:03,050 - mmdet - INFO - workflow: [('train', 2)], max: 20 epochs
2023-04-17 03:23:04.516097: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-04-17 03:23:04.516235: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-04-17 03:23:04.516248: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator())
2023-04-17 03:25:20,186 - mmdet - INFO - Epoch [1][50/85] lr: 1.115e-04, eta: 1:14:05, time: 2.694, data_time: 0.135, memory: 12569, loss_heatmap: 215.9703, layer_-1_loss_cls: 4.6548, layer_-1_loss_bbox: 13.0959, matched_ious: 0.0027, loss: 233.7210, grad_norm: 1336.1663
2023-04-17 03:26:50,708 - mmdet - INFO - Saving checkpoint at 1 epochs
[ ] 0/81, elapsed: 0s, ETA:/root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.post_center_range, device=heatmap.device)
/root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.post_center_range, device=heatmap.device)
/root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.post_center_range, device=heatmap.device)
[>> ] 4/81, 1.8 task/s, elapsed: 2s, ETA: 42s/root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.post_center_range, device=heatmap.device)
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 84/81, 11.9 task/s, elapsed: 7s, ETA: 0s
Formating bboxes of pts_bbox
Start to convert detection format...
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 81/81, 34.6 task/s, elapsed: 2s, ETA: 0s
Results writes to /tmp/tmps7pn7cvi/results/pts_bbox/results_nusc.json
Evaluating bboxes of pts_bbox
aaaaaaaaaaaaaaaa mini_val /tmp/tmps7pn7cvi/results/pts_bbox
Traceback (most recent call last):
File "tools/train.py", line 253, in
main()
File "tools/train.py", line 249, in main
meta=meta)
File "/usr/local/lib/python3.6/dist-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
self.call_hook('after_train_epoch')
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/usr/local/lib/python3.6/dist-packages/mmdet/core/evaluation/eval_hooks.py", line 279, in after_train_epoch
key_score = self.evaluate(runner, results)
File "/usr/local/lib/python3.6/dist-packages/mmdet/core/evaluation/eval_hooks.py", line 177, in evaluate
results, logger=runner.logger, **self.eval_kwargs)
File "/root/work/TransFusion/mmdet3d/datasets/nuscenes_dataset.py", line 489, in evaluate
ret_dict = self._evaluate_single(result_files[name])
File "/root/work/TransFusion/mmdet3d/datasets/nuscenes_dataset.py", line 400, in _evaluate_single
verbose=False)
File "/usr/local/lib/python3.6/dist-packages/nuscenes/eval/detection/evaluate.py", line 94, in init
self.pred_boxes = filter_eval_boxes(nusc, self.pred_boxes, self.cfg.class_range, verbose=verbose)
File "/usr/local/lib/python3.6/dist-packages/nuscenes/eval/common/loaders.py", line 219, in filter_eval_boxes
class_field = _get_box_class_field(eval_boxes)
File "/usr/local/lib/python3.6/dist-packages/nuscenes/eval/common/loaders.py", line 283, in _get_box_class_field
raise Exception('Error: Invalid box type: %s' % box)
Exception: Error: Invalid box type: None
Killing subprocess 88
Killing subprocess 89
Killing subprocess 90
Killing subprocess 91
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 340, in
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', 'tools/train.py', '--local_rank=3', 'configs/transfusion_nusc_voxel_L.py', '--launcher', 'pytorch']' returned non-zero exit status 1.
The text was updated successfully, but these errors were encountered: