Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My running outputs remain the same for a long time #53

Closed
hosea7456 opened this issue Jul 21, 2021 · 6 comments
Closed

My running outputs remain the same for a long time #53

hosea7456 opened this issue Jul 21, 2021 · 6 comments

Comments

@hosea7456
Copy link

Hi, When I running the code, the outputs remain the same, it can't keep on running.

2021-07-21 12:24:37 | INFO | yolox.core.trainer:130 - Model Summary: Params: 99.00M, Gflops: 281.52
2021-07-21 12:24:37 | INFO | apex.amp.frontend:328 - Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
2021-07-21 12:24:37 | INFO | apex.amp.frontend:329 - Defaults for this optimization level are:
2021-07-21 12:24:37 | INFO | apex.amp.frontend:331 - enabled : True
2021-07-21 12:24:37 | INFO | apex.amp.frontend:331 - opt_level : O1
2021-07-21 12:24:37 | INFO | apex.amp.frontend:331 - cast_model_type : None
2021-07-21 12:24:37 | INFO | apex.amp.frontend:331 - patch_torch_functions : True
2021-07-21 12:24:37 | INFO | apex.amp.frontend:331 - keep_batchnorm_fp32 : None
2021-07-21 12:24:37 | INFO | apex.amp.frontend:331 - master_weights : None
2021-07-21 12:24:37 | INFO | apex.amp.frontend:331 - loss_scale : dynamic
2021-07-21 12:24:37 | INFO | apex.amp.frontend:336 - Processing user overrides (additional kwargs that are not None)...
2021-07-21 12:24:37 | INFO | apex.amp.frontend:354 - After processing overrides, optimization options are:
2021-07-21 12:24:37 | INFO | apex.amp.frontend:356 - enabled : True
2021-07-21 12:24:37 | INFO | apex.amp.frontend:356 - opt_level : O1
2021-07-21 12:24:37 | INFO | apex.amp.frontend:356 - cast_model_type : None
2021-07-21 12:24:37 | INFO | apex.amp.frontend:356 - patch_torch_functions : True
2021-07-21 12:24:37 | INFO | apex.amp.frontend:356 - keep_batchnorm_fp32 : None
2021-07-21 12:24:37 | INFO | apex.amp.frontend:356 - master_weights : None
2021-07-21 12:24:37 | INFO | apex.amp.frontend:356 - loss_scale : dynamic
2021-07-21 12:24:37 | INFO | apex.amp.scaler:69 - Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
2021-07-21 12:24:37 | INFO | yolox.core.trainer:283 - loading checkpoint for fine tuning
2021-07-21 12:24:38 | WARNING | yolox.utils.checkpoint:26 - Shape of head.cls_preds.0.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.0.weight in model is torch.Size([3, 320, 1, 1]).
2021-07-21 12:24:38 | WARNING | yolox.utils.checkpoint:26 - Shape of head.cls_preds.0.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.0.bias in model is torch.Size([3]).
2021-07-21 12:24:38 | WARNING | yolox.utils.checkpoint:26 - Shape of head.cls_preds.1.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.1.weight in model is torch.Size([3, 320, 1, 1]).
2021-07-21 12:24:38 | WARNING | yolox.utils.checkpoint:26 - Shape of head.cls_preds.1.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.1.bias in model is torch.Size([3]).
2021-07-21 12:24:38 | WARNING | yolox.utils.checkpoint:26 - Shape of head.cls_preds.2.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.2.weight in model is torch.Size([3, 320, 1, 1]).
2021-07-21 12:24:38 | WARNING | yolox.utils.checkpoint:26 - Shape of head.cls_preds.2.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.2.bias in model is torch.Size([3]).
2021-07-21 12:24:38 | INFO | yolox.data.datasets.coco:44 - loading annotations into memory...
2021-07-21 12:24:38 | INFO | yolox.data.datasets.coco:44 - Done (t=0.28s)
2021-07-21 12:24:38 | INFO | pycocotools.coco:92 - creating index...
2021-07-21 12:24:38 | INFO | pycocotools.coco:92 - index created!
2021-07-21 12:24:38 | INFO | yolox.core.trainer:149 - init prefetcher, this might take one minute or less...

What's the problem?

@Joker316701882
Copy link
Member

How long does it remain here?

@hosea7456
Copy link
Author

@Joker316701882
More than half an hour then I closed it. It hasn't print any error worring but it remain sucking up GPU memory.

@Joker316701882
Copy link
Member

self.data_num_workers = 4

Please try to set this value to 0.

@Joker316701882
Copy link
Member

You may also need to try to use single gpu:
-d 1 -b 8

@hosea7456
Copy link
Author

@Joker316701882
Yea, I have already set it to be 0. And I also try use single gpu with: -d 1 -b 2, but it still doesn't works.

@GOATmessi8
Copy link
Member

See #103, plz pull the latest updates and retry it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants