Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training loss nan #9

Closed
Zhiyuan-R opened this issue Jan 25, 2023 · 6 comments
Closed

training loss nan #9

Zhiyuan-R opened this issue Jan 25, 2023 · 6 comments

Comments

@Zhiyuan-R
Copy link

Hi, I train the vae model as the readme part tells. But the training loss become nan. I use 4 gpu and 40 batchsize. And I keep the left the same in the repo.

@ZENGXH
Copy link
Collaborator

ZENGXH commented Jan 25, 2023

Are you using the ShapeNet dataset as well? Can you share the training log here?

@Zhiyuan-R
Copy link
Author

Yes! I use shapeNet v2 core 15k(downloading from PVD)

@Zhiyuan-R
Copy link
Author

2023-01-25 00:17:56.473 | INFO | main:get_args:205 - EXP_ROOT: ./exp + exp name: 0125/car/3dbf3ah_hvae_lion_B40, save dir: ./exp/0125/car/3dbf3ah_hvae_lion_B40
2023-01-25 00:17:56.490 | INFO | main:get_args:210 - save config at ./exp/0125/car/3dbf3ah_hvae_lion_B40/cfg.yml
2023-01-25 00:17:56.491 | INFO | main:get_args:213 - log dir: ./exp/0125/car/3dbf3ah_hvae_lion_B40
2023-01-25 00:17:56.491 | INFO | main::227 - In Rank=0
2023-01-25 00:17:56.491 | INFO | main::233 - Node rank 0, local proc 0, global proc 0
2023-01-25 00:17:56.503 | INFO | main::227 - In Rank=1
2023-01-25 00:17:56.504 | INFO | main::233 - Node rank 0, local proc 1, global proc 1
2023-01-25 00:17:56.515 | INFO | main::227 - In Rank=2
2023-01-25 00:17:56.516 | INFO | main::233 - Node rank 0, local proc 2, global proc 2
2023-01-25 00:17:56.528 | INFO | main::227 - In Rank=3
2023-01-25 00:17:56.529 | INFO | main::233 - Node rank 0, local proc 3, global proc 3
2023-01-25 00:17:56.541 | INFO | main::241 - join 3
2023-01-25 00:17:56.651 | DEBUG | utils.utils:init_processes:1140 - set port as 6011
2023-01-25 00:17:56.652 | INFO | utils.utils:init_processes:1151 - init_process: rank=0, world_size=4
2023-01-25 00:17:56.663 | DEBUG | utils.utils:init_processes:1140 - set port as 6011
2023-01-25 00:17:56.664 | INFO | utils.utils:init_processes:1151 - init_process: rank=1, world_size=4
2023-01-25 00:17:56.679 | DEBUG | utils.utils:init_processes:1140 - set port as 6011
2023-01-25 00:17:56.680 | INFO | utils.utils:init_processes:1151 - init_process: rank=2, world_size=4
2023-01-25 00:17:56.715 | DEBUG | utils.utils:init_processes:1140 - set port as 6011
2023-01-25 00:17:56.716 | INFO | utils.utils:init_processes:1151 - init_process: rank=3, world_size=4
2023-01-25 00:17:57.827 | INFO | main:main:29 - use trainer: trainers.hvae_trainer
2023-01-25 00:17:57.831 | INFO | main:main:29 - use trainer: trainers.hvae_trainer
2023-01-25 00:17:57.832 | INFO | main:main:29 - use trainer: trainers.hvae_trainer
2023-01-25 00:17:57.836 | INFO | main:main:29 - use trainer: trainers.hvae_trainer
2023-01-25 00:18:01.625 | INFO | utils.utils:common_init:466 - [common-init] at rank=2, seed=1
2023-01-25 00:18:01.626 | INFO | utils.utils:init:339 - rank=2, init writer as a blackhole
2023-01-25 00:18:01.626 | INFO | utils.utils:common_init:510 - [common-init] DONE
2023-01-25 00:18:01.670 | INFO | utils.utils:common_init:466 - [common-init] at rank=3, seed=1
2023-01-25 00:18:01.671 | INFO | utils.utils:init:339 - rank=3, init writer as a blackhole
2023-01-25 00:18:01.671 | INFO | utils.utils:common_init:510 - [common-init] DONE
2023-01-25 00:18:01.691 | INFO | utils.utils:common_init:466 - [common-init] at rank=0, seed=1
2023-01-25 00:18:01.691 | INFO | utils.utils:common_init:466 - [common-init] at rank=1, seed=1
2023-01-25 00:18:01.692 | INFO | utils.utils:init:331 - Not init TFB
2023-01-25 00:18:01.692 | INFO | utils.utils:init:339 - rank=1, init writer as a blackhole
2023-01-25 00:18:01.692 | INFO | utils.utils:common_init:510 - [common-init] DONE
2023-01-25 00:18:01.693 | INFO | utils.utils:common_init:510 - [common-init] DONE
2023-01-25 00:18:06.292 | INFO | utils.model_helper:import_model:106 - import: models.shapelatent_modules.PointNetPlusEncoder
2023-01-25 00:18:06.292 | INFO | utils.model_helper:import_model:106 - import: models.shapelatent_modules.PointNetPlusEncoder
2023-01-25 00:18:06.293 | INFO | utils.model_helper:import_model:106 - import: models.shapelatent_modules.PointNetPlusEncoder
2023-01-25 00:18:06.298 | INFO | utils.model_helper:import_model:106 - import: models.shapelatent_modules.PointNetPlusEncoder
2023-01-25 00:18:06.308 | INFO | models.shapelatent_modules:init:29 - [Encoder] zdim=128, out_sigma=True; force_att: 0
2023-01-25 00:18:06.309 | INFO | utils.model_helper:import_model:106 - import: models.latent_points_ada.PointTransPVC
2023-01-25 00:18:06.311 | INFO | models.latent_points_ada:init:38 - [Build Unet] extra_feature_channels=0, input_dim=3
2023-01-25 00:18:06.313 | INFO | models.shapelatent_modules:init:29 - [Encoder] zdim=128, out_sigma=True; force_att: 0
2023-01-25 00:18:06.314 | INFO | utils.model_helper:import_model:106 - import: models.latent_points_ada.PointTransPVC
2023-01-25 00:18:06.317 | INFO | models.latent_points_ada:init:38 - [Build Unet] extra_feature_channels=0, input_dim=3
2023-01-25 00:18:06.318 | INFO | models.shapelatent_modules:init:29 - [Encoder] zdim=128, out_sigma=True; force_att: 0
2023-01-25 00:18:06.318 | INFO | utils.model_helper:import_model:106 - import: models.latent_points_ada.PointTransPVC
2023-01-25 00:18:06.321 | INFO | models.latent_points_ada:init:38 - [Build Unet] extra_feature_channels=0, input_dim=3
2023-01-25 00:18:06.329 | INFO | models.shapelatent_modules:init:29 - [Encoder] zdim=128, out_sigma=True; force_att: 0
2023-01-25 00:18:06.329 | INFO | utils.model_helper:import_model:106 - import: models.latent_points_ada.PointTransPVC
2023-01-25 00:18:06.332 | INFO | models.latent_points_ada:init:38 - [Build Unet] extra_feature_channels=0, input_dim=3
2023-01-25 00:18:06.457 | INFO | utils.model_helper:import_model:106 - import: models.latent_points_ada.LatentPointDecPVC
2023-01-25 00:18:06.458 | INFO | models.latent_points_ada:init:241 - [Build Dec] point_dim=3, context_dim=1
2023-01-25 00:18:06.458 | INFO | models.latent_points_ada:init:38 - [Build Unet] extra_feature_channels=1, input_dim=3
2023-01-25 00:18:06.473 | INFO | utils.model_helper:import_model:106 - import: models.latent_points_ada.LatentPointDecPVC
2023-01-25 00:18:06.474 | INFO | models.latent_points_ada:init:241 - [Build Dec] point_dim=3, context_dim=1
2023-01-25 00:18:06.474 | INFO | models.latent_points_ada:init:38 - [Build Unet] extra_feature_channels=1, input_dim=3
2023-01-25 00:18:06.478 | INFO | utils.model_helper:import_model:106 - import: models.latent_points_ada.LatentPointDecPVC
2023-01-25 00:18:06.479 | INFO | models.latent_points_ada:init:241 - [Build Dec] point_dim=3, context_dim=1
2023-01-25 00:18:06.479 | INFO | models.latent_points_ada:init:38 - [Build Unet] extra_feature_channels=1, input_dim=3
2023-01-25 00:18:06.505 | INFO | utils.model_helper:import_model:106 - import: models.latent_points_ada.LatentPointDecPVC
2023-01-25 00:18:06.505 | INFO | models.latent_points_ada:init:241 - [Build Dec] point_dim=3, context_dim=1
2023-01-25 00:18:06.505 | INFO | models.latent_points_ada:init:38 - [Build Unet] extra_feature_channels=1, input_dim=3
2023-01-25 00:18:06.594 | INFO | models.vae_adain:init:50 - [Build Model] style_encoder: models.shapelatent_modules.PointNetPlusEncoder, encoder: models.latent_points_ada.PointTransPVC, decoder: models.latent_points_ada.LatentPointDecPVC
2023-01-25 00:18:06.610 | INFO | models.vae_adain:init:50 - [Build Model] style_encoder: models.shapelatent_modules.PointNetPlusEncoder, encoder: models.latent_points_ada.PointTransPVC, decoder: models.latent_points_ada.LatentPointDecPVC
2023-01-25 00:18:06.613 | INFO | models.vae_adain:init:50 - [Build Model] style_encoder: models.shapelatent_modules.PointNetPlusEncoder, encoder: models.latent_points_ada.PointTransPVC, decoder: models.latent_points_ada.LatentPointDecPVC
2023-01-25 00:18:06.640 | INFO | models.vae_adain:init:50 - [Build Model] style_encoder: models.shapelatent_modules.PointNetPlusEncoder, encoder: models.latent_points_ada.PointTransPVC, decoder: models.latent_points_ada.LatentPointDecPVC
2023-01-25 00:18:06.655 | INFO | trainers.hvae_trainer:init:53 - broadcast_params: device=cuda:2
2023-01-25 00:18:06.663 | INFO | trainers.hvae_trainer:init:53 - broadcast_params: device=cuda:1
2023-01-25 00:18:06.669 | INFO | trainers.hvae_trainer:init:53 - broadcast_params: device=cuda:3
2023-01-25 00:18:06.689 | INFO | trainers.base_trainer:build_other_module:712 - no other module to build
2023-01-25 00:18:06.689 | INFO | trainers.hvae_trainer:init:58 - waitting for barrier, device=cuda:2
2023-01-25 00:18:06.696 | INFO | trainers.hvae_trainer:init:53 - broadcast_params: device=cuda:0
2023-01-25 00:18:06.704 | INFO | trainers.base_trainer:build_other_module:712 - no other module to build
2023-01-25 00:18:06.704 | INFO | trainers.hvae_trainer:init:58 - waitting for barrier, device=cuda:3
2023-01-25 00:18:06.705 | INFO | trainers.base_trainer:build_other_module:712 - no other module to build
2023-01-25 00:18:06.705 | INFO | trainers.hvae_trainer:init:58 - waitting for barrier, device=cuda:1
2023-01-25 00:18:06.728 | INFO | trainers.base_trainer:build_other_module:712 - no other module to build
2023-01-25 00:18:06.729 | INFO | trainers.hvae_trainer:init:58 - waitting for barrier, device=cuda:0
2023-01-25 00:18:06.729 | INFO | trainers.hvae_trainer:init:60 - pass barrier, device=cuda:0
2023-01-25 00:18:06.729 | INFO | trainers.hvae_trainer:init:60 - pass barrier, device=cuda:2
2023-01-25 00:18:06.729 | INFO | trainers.hvae_trainer:init:60 - pass barrier, device=cuda:1
2023-01-25 00:18:06.729 | INFO | trainers.hvae_trainer:init:60 - pass barrier, device=cuda:3
2023-01-25 00:18:06.729 | INFO | trainers.base_trainer:build_data:152 - start build_data
2023-01-25 00:18:06.729 | INFO | trainers.base_trainer:build_data:152 - start build_data
2023-01-25 00:18:06.729 | INFO | trainers.base_trainer:build_data:152 - start build_data
2023-01-25 00:18:06.729 | INFO | trainers.base_trainer:build_data:152 - start build_data
2023-01-25 00:18:09.476 | INFO | datasets.pointflow_datasets:get_datasets:333 - get_datasets: tr_sample_size=2048, te_sample_size=2048; random_subsample=1 normalize_global=True normalize_std_per_axix=False normalize_per_shape=False recenter_per_shape=False
2023-01-25 00:18:09.477 | INFO | datasets.pointflow_datasets:init:108 - [DATA] cat: car, split: train, full path: ./data/ShapeNetCore.v2.PC15k/; norm global=True, norm-box=False
2023-01-25 00:18:09.478 | INFO | datasets.pointflow_datasets:get_datasets:333 - get_datasets: tr_sample_size=2048, te_sample_size=2048; random_subsample=1 normalize_global=True normalize_std_per_axix=False normalize_per_shape=False recenter_per_shape=False
2023-01-25 00:18:09.478 | INFO | datasets.pointflow_datasets:init:108 - [DATA] cat: car, split: train, full path: ./data/ShapeNetCore.v2.PC15k/; norm global=True, norm-box=False
2023-01-25 00:18:09.487 | INFO | datasets.pointflow_datasets:init:157 - [DATA] number of file [2458] under: ./data/ShapeNetCore.v2.PC15k/02958343/train
2023-01-25 00:18:09.487 | INFO | datasets.pointflow_datasets:init:157 - [DATA] number of file [2458] under: ./data/ShapeNetCore.v2.PC15k/02958343/train
2023-01-25 00:18:09.619 | INFO | datasets.pointflow_datasets:get_datasets:333 - get_datasets: tr_sample_size=2048, te_sample_size=2048; random_subsample=1 normalize_global=True normalize_std_per_axix=False normalize_per_shape=False recenter_per_shape=False
2023-01-25 00:18:09.619 | INFO | datasets.pointflow_datasets:init:108 - [DATA] cat: car, split: train, full path: ./data/ShapeNetCore.v2.PC15k/; norm global=True, norm-box=False
2023-01-25 00:18:09.626 | INFO | datasets.pointflow_datasets:init:157 - [DATA] number of file [2458] under: ./data/ShapeNetCore.v2.PC15k/02958343/train
2023-01-25 00:18:09.781 | INFO | datasets.pointflow_datasets:get_datasets:333 - get_datasets: tr_sample_size=2048, te_sample_size=2048; random_subsample=1 normalize_global=True normalize_std_per_axix=False normalize_per_shape=False recenter_per_shape=False
2023-01-25 00:18:09.781 | INFO | datasets.pointflow_datasets:init:108 - [DATA] cat: car, split: train, full path: ./data/ShapeNetCore.v2.PC15k/; norm global=True, norm-box=False
2023-01-25 00:18:09.787 | INFO | datasets.pointflow_datasets:init:157 - [DATA] number of file [2458] under: ./data/ShapeNetCore.v2.PC15k/02958343/train
2023-01-25 00:18:11.014 | INFO | datasets.pointflow_datasets:init:170 - [DATA] Load data time: 1.5s | dir: ['02958343'] | sample_with_replacement: 1; num points: 2458
2023-01-25 00:18:11.125 | INFO | datasets.pointflow_datasets:init:170 - [DATA] Load data time: 1.5s | dir: ['02958343'] | sample_with_replacement: 1; num points: 2458
2023-01-25 00:18:11.149 | INFO | datasets.pointflow_datasets:init:170 - [DATA] Load data time: 1.7s | dir: ['02958343'] | sample_with_replacement: 1; num points: 2458
2023-01-25 00:18:11.199 | INFO | datasets.pointflow_datasets:init:170 - [DATA] Load data time: 1.4s | dir: ['02958343'] | sample_with_replacement: 1; num points: 2458
2023-01-25 00:18:12.484 | INFO | datasets.pointflow_datasets:init:234 - [DATA] normalize_global: mean=[0.00131747 0.00735971 0.02350355], std=[0.1634924]
2023-01-25 00:18:12.618 | INFO | datasets.pointflow_datasets:init:234 - [DATA] normalize_global: mean=[0.00131747 0.00735971 0.02350355], std=[0.1634924]
2023-01-25 00:18:12.801 | INFO | datasets.pointflow_datasets:init:234 - [DATA] normalize_global: mean=[0.00131747 0.00735971 0.02350355], std=[0.1634924]
2023-01-25 00:18:12.810 | INFO | datasets.pointflow_datasets:init:234 - [DATA] normalize_global: mean=[0.00131747 0.00735971 0.02350355], std=[0.1634924]
2023-01-25 00:18:13.351 | INFO | datasets.pointflow_datasets:init:241 - [DATA] shape=(2458, 15000, 3), all_points_mean:=(1, 1, 3), std=(1, 1, 1), max=4.166, min=-4.333; num-pts=2048
2023-01-25 00:18:13.375 | INFO | datasets.pointflow_datasets:init:108 - [DATA] cat: car, split: val, full path: ./data/ShapeNetCore.v2.PC15k/; norm global=True, norm-box=False
2023-01-25 00:18:13.376 | INFO | datasets.pointflow_datasets:init:157 - [DATA] number of file [352] under: ./data/ShapeNetCore.v2.PC15k/02958343/val
2023-01-25 00:18:13.450 | INFO | datasets.pointflow_datasets:init:241 - [DATA] shape=(2458, 15000, 3), all_points_mean:=(1, 1, 3), std=(1, 1, 1), max=4.166, min=-4.333; num-pts=2048
2023-01-25 00:18:13.479 | INFO | datasets.pointflow_datasets:init:108 - [DATA] cat: car, split: val, full path: ./data/ShapeNetCore.v2.PC15k/; norm global=True, norm-box=False
2023-01-25 00:18:13.480 | INFO | datasets.pointflow_datasets:init:157 - [DATA] number of file [352] under: ./data/ShapeNetCore.v2.PC15k/02958343/val
2023-01-25 00:18:13.534 | INFO | datasets.pointflow_datasets:init:170 - [DATA] Load data time: 0.2s | dir: ['02958343'] | sample_with_replacement: 1; num points: 352
2023-01-25 00:18:13.615 | INFO | datasets.pointflow_datasets:init:241 - [DATA] shape=(2458, 15000, 3), all_points_mean:=(1, 1, 3), std=(1, 1, 1), max=4.166, min=-4.333; num-pts=2048
2023-01-25 00:18:13.623 | INFO | datasets.pointflow_datasets:init:241 - [DATA] shape=(2458, 15000, 3), all_points_mean:=(1, 1, 3), std=(1, 1, 1), max=4.166, min=-4.333; num-pts=2048
2023-01-25 00:18:13.639 | INFO | datasets.pointflow_datasets:init:108 - [DATA] cat: car, split: val, full path: ./data/ShapeNetCore.v2.PC15k/; norm global=True, norm-box=False
2023-01-25 00:18:13.640 | INFO | datasets.pointflow_datasets:init:157 - [DATA] number of file [352] under: ./data/ShapeNetCore.v2.PC15k/02958343/val
2023-01-25 00:18:13.646 | INFO | datasets.pointflow_datasets:init:108 - [DATA] cat: car, split: val, full path: ./data/ShapeNetCore.v2.PC15k/; norm global=True, norm-box=False
2023-01-25 00:18:13.647 | INFO | datasets.pointflow_datasets:init:157 - [DATA] number of file [352] under: ./data/ShapeNetCore.v2.PC15k/02958343/val
2023-01-25 00:18:13.648 | INFO | datasets.pointflow_datasets:init:170 - [DATA] Load data time: 0.2s | dir: ['02958343'] | sample_with_replacement: 1; num points: 352
2023-01-25 00:18:13.676 | INFO | datasets.pointflow_datasets:init:241 - [DATA] shape=(352, 15000, 3), all_points_mean:=(1, 1, 3), std=(1, 1, 1), max=4.002, min=-4.059; num-pts=2048
2023-01-25 00:18:13.677 | INFO | datasets.pointflow_datasets:get_data_loaders:398 - [Batch Size] train=40, test=10; drop-last=1
2023-01-25 00:18:13.683 | INFO | trainers.hvae_trainer:init:75 - done init trainer @cuda:2
2023-01-25 00:18:13.794 | INFO | datasets.pointflow_datasets:init:241 - [DATA] shape=(352, 15000, 3), all_points_mean:=(1, 1, 3), std=(1, 1, 1), max=4.002, min=-4.059; num-pts=2048
2023-01-25 00:18:13.795 | INFO | datasets.pointflow_datasets:get_data_loaders:398 - [Batch Size] train=40, test=10; drop-last=1
2023-01-25 00:18:13.801 | INFO | trainers.hvae_trainer:init:75 - done init trainer @cuda:0
2023-01-25 00:18:13.842 | INFO | datasets.pointflow_datasets:init:170 - [DATA] Load data time: 0.2s | dir: ['02958343'] | sample_with_replacement: 1; num points: 352
2023-01-25 00:18:13.863 | INFO | datasets.pointflow_datasets:init:170 - [DATA] Load data time: 0.2s | dir: ['02958343'] | sample_with_replacement: 1; num points: 352
2023-01-25 00:18:14.004 | INFO | datasets.pointflow_datasets:init:241 - [DATA] shape=(352, 15000, 3), all_points_mean:=(1, 1, 3), std=(1, 1, 1), max=4.002, min=-4.059; num-pts=2048
2023-01-25 00:18:14.005 | INFO | datasets.pointflow_datasets:get_data_loaders:398 - [Batch Size] train=40, test=10; drop-last=1
2023-01-25 00:18:14.010 | INFO | trainers.hvae_trainer:init:75 - done init trainer @cuda:3
2023-01-25 00:18:14.040 | INFO | datasets.pointflow_datasets:init:241 - [DATA] shape=(352, 15000, 3), all_points_mean:=(1, 1, 3), std=(1, 1, 1), max=4.002, min=-4.059; num-pts=2048
2023-01-25 00:18:14.042 | INFO | datasets.pointflow_datasets:get_data_loaders:398 - [Batch Size] train=40, test=10; drop-last=1
2023-01-25 00:18:14.053 | INFO | trainers.hvae_trainer:init:75 - done init trainer @cuda:1
2023-01-25 00:18:14.394 | INFO | trainers.base_trainer:prepare_vis_data:676 - [prepare_vis_data] len of train_loader: 15
2023-01-25 00:18:14.655 | INFO | trainers.base_trainer:prepare_vis_data:676 - [prepare_vis_data] len of train_loader: 15
2023-01-25 00:18:14.924 | INFO | trainers.base_trainer:prepare_vis_data:676 - [prepare_vis_data] len of train_loader: 15
2023-01-25 00:18:14.959 | INFO | trainers.base_trainer:prepare_vis_data:676 - [prepare_vis_data] len of train_loader: 15
2023-01-25 00:18:15.220 | INFO | trainers.base_trainer:prepare_vis_data:691 - tr_x: torch.Size([16, 2048, 3]), m_pcs: torch.Size([16, 1, 3]), s_pcs: torch.Size([16, 1, 1]), val_x: torch.Size([16, 2048, 3])
2023-01-25 00:18:15.247 | INFO | main:main:46 - param size = 22.402731M
2023-01-25 00:18:15.249 | INFO | main:main:68 - not find any checkpoint: ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints, (exist=False), or snapshot ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints/snapshot, (exist=False)
2023-01-25 00:18:15.250 | INFO | trainers.base_trainer:train_epochs:173 - [rank=2] Start epoch: 0 End epoch: 800, batch-size=40 | Niter/epo=15 | log freq=15, viz freq 6000, val freq 200
2023-01-25 00:18:15.580 | INFO | trainers.base_trainer:prepare_vis_data:691 - tr_x: torch.Size([16, 2048, 3]), m_pcs: torch.Size([16, 1, 3]), s_pcs: torch.Size([16, 1, 1]), val_x: torch.Size([16, 2048, 3])
2023-01-25 00:18:15.614 | INFO | main:main:46 - param size = 22.402731M
2023-01-25 00:18:15.615 | INFO | trainers.base_trainer:set_writer:57 -

./exp/0125/car/3dbf3ah_hvae_lion_B40

2023-01-25 00:18:15.622 | INFO | main:main:68 - not find any checkpoint: ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints, (exist=False), or snapshot ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints/snapshot, (exist=False)
2023-01-25 00:18:15.637 | INFO | trainers.base_trainer:train_epochs:173 - [rank=0] Start epoch: 0 End epoch: 800, batch-size=40 | Niter/epo=15 | log freq=15, viz freq 6000, val freq 200
2023-01-25 00:18:15.845 | INFO | trainers.base_trainer:prepare_vis_data:691 - tr_x: torch.Size([16, 2048, 3]), m_pcs: torch.Size([16, 1, 3]), s_pcs: torch.Size([16, 1, 1]), val_x: torch.Size([16, 2048, 3])
2023-01-25 00:18:15.865 | INFO | main:main:46 - param size = 22.402731M
2023-01-25 00:18:15.865 | INFO | main:main:68 - not find any checkpoint: ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints, (exist=False), or snapshot ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints/snapshot, (exist=False)
2023-01-25 00:18:15.866 | INFO | trainers.base_trainer:train_epochs:173 - [rank=1] Start epoch: 0 End epoch: 800, batch-size=40 | Niter/epo=15 | log freq=15, viz freq 6000, val freq 200
2023-01-25 00:18:15.904 | INFO | trainers.base_trainer:prepare_vis_data:691 - tr_x: torch.Size([16, 2048, 3]), m_pcs: torch.Size([16, 1, 3]), s_pcs: torch.Size([16, 1, 1]), val_x: torch.Size([16, 2048, 3])
2023-01-25 00:18:15.947 | INFO | main:main:46 - param size = 22.402731M
2023-01-25 00:18:15.948 | INFO | main:main:68 - not find any checkpoint: ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints, (exist=False), or snapshot ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints/snapshot, (exist=False)
2023-01-25 00:18:15.949 | INFO | trainers.base_trainer:train_epochs:173 - [rank=3] Start epoch: 0 End epoch: 800, batch-size=40 | Niter/epo=15 | log freq=15, viz freq 6000, val freq 200
2023-01-25 00:18:38.808 | INFO | trainers.common_fun:validate_inspect_noprior:104 - writer: none
2023-01-25 00:19:01.551 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E0 iter[ 14/ 15] | [Loss] 1053511558071768433312137216.00 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 14 | url none | [time] 0.8m (~10h) |[best] 0 -100.000x1e-2
2023-01-25 00:19:26.789 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E1 iter[ 14/ 15] | [Loss] 52998332140144.68 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 29 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:19:52.065 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E2 iter[ 14/ 15] | [Loss] 2302480512959926.50 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 44 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:20:17.565 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E3 iter[ 14/ 15] | [Loss] 2568395833090570240.00 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 59 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:20:43.245 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E4 iter[ 14/ 15] | [Loss] 17809658881111334949003391401984.00 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 74 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:21:09.074 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E5 iter[ 14/ 15] | [Loss] 51566519.44 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 89 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:21:34.569 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E6 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 104 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:22:00.025 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E7 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 119 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:22:25.365 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E8 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 134 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:22:50.734 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E9 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 149 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:23:16.079 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E10 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 164 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:23:41.553 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E11 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 179 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:24:07.110 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E12 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 194 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:24:32.557 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E13 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 209 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:24:58.175 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E14 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 224 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:25:23.746 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E15 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 239 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:25:49.360 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E16 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 254 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:26:14.849 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E17 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 269 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:26:40.303 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E18 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 284 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:27:05.658 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E19 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 299 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:27:30.983 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E20 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 314 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:27:56.319 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E21 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 329 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:28:21.645 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E22 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 344 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:28:47.099 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E23 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 359 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:29:12.520 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E24 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 374 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:29:38.024 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E25 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 389 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:30:03.487 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E26 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 404 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:30:28.738 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E27 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 419 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:30:53.995 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E28 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 434 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:31:19.211 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E29 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 449 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:31:44.334 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E30 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 464 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:32:09.531 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E31 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 479 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:32:34.830 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E32 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 494 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:33:00.156 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E33 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 509 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:33:25.532 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E34 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 524 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:33:51.046 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E35 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 539 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:34:16.399 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E36 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 554 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:34:41.735 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E37 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 569 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:35:07.293 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E38 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 584 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:35:32.838 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E39 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 599 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:35:58.269 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E40 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 614 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:36:23.576 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E41 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 629 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:36:48.992 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E42 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 644 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:37:14.421 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E43 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 659 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:37:39.900 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E44 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 674 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:38:05.390 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E45 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 689 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:38:30.775 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E46 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 704 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:38:56.245 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E47 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 719 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:39:21.554 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E48 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 734 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:39:46.962 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E49 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 749 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:40:12.541 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E50 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 764 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:40:37.824 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E51 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 779 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:41:03.322 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E52 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 794 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:41:28.677 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E53 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 809 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:41:54.058 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E54 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 824 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:42:19.352 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E55 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 839 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:42:44.698 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E56 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 854 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:43:10.033 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E57 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 869 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:43:35.285 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E58 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 884 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:44:00.556 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E59 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 899 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:44:26.083 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E60 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 914 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:44:51.436 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E61 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 929 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:45:16.718 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E62 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 944 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:45:42.198 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E63 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 959 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:46:07.561 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E64 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 974 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:46:32.978 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E65 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 989 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:46:58.470 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E66 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1004 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:47:23.890 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E67 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1019 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:47:49.273 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E68 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1034 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:48:14.624 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E69 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1049 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:48:39.986 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E70 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1064 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:48:40.001 | INFO | trainers.base_trainer:save:106 - save model as : ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints/snapshot_bak
2023-01-25 00:49:06.715 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E71 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1079 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:49:31.999 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E72 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1094 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:49:57.406 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E73 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1109 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:50:22.712 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E74 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1124 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:50:48.098 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E75 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1139 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:51:13.579 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E76 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1154 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:51:39.065 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E77 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1169 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:52:04.374 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E78 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1184 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:52:29.969 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E79 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1199 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:52:55.488 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E80 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1214 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:53:20.759 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E81 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1229 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:53:46.118 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E82 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1244 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:54:11.518 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E83 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1259 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:54:36.914 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E84 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1274 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:55:02.136 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E85 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1289 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:55:27.793 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E86 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1304 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:55:53.190 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E87 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1319 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:56:18.534 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E88 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1334 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:56:44.018 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E89 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1349 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:57:09.309 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E90 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1364 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 00:57:34.684 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E91 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1379 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 00:57:59.997 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E92 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1394 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 00:58:25.479 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E93 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1409 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2
2023-01-25 00:58:50.932 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E94 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1424 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 00:59:16.326 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E95 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1439 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 00:59:41.795 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E96 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1454 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:00:07.162 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E97 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1469 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:00:32.569 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E98 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1484 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:00:58.136 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E99 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1499 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:01:23.533 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E100 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1514 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:01:48.939 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E101 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1529 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:02:14.562 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E102 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1544 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:02:39.900 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E103 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1559 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:03:05.674 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E104 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1574 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:03:31.050 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E105 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1589 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:03:56.486 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E106 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1604 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:04:21.979 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E107 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1619 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:04:47.400 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E108 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1634 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:05:12.816 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E109 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1649 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:05:38.353 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E110 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1664 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:06:03.822 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E111 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1679 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:06:29.280 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E112 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1694 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:06:54.803 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E113 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1709 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:07:20.158 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E114 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1724 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:07:45.551 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E115 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1739 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:08:11.027 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E116 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1754 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:08:36.365 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E117 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1769 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:09:01.709 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E118 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1784 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:09:27.067 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E119 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1799 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:09:52.533 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E120 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1814 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:10:18.148 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E121 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1829 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:10:43.401 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E122 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1844 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:11:08.755 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E123 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1859 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:11:34.165 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E124 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1874 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:11:59.572 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E125 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1889 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:12:24.968 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E126 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1904 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:12:50.169 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E127 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1919 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:13:15.662 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E128 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1934 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
2023-01-25 01:13:41.159 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E129 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1949 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2

@Zhiyuan-R
Copy link
Author

And below is my config

bash_name: ''
clipforge:
clip_model: ViT-B/32
enable: 0
feat_dim: 512
cmt: lion
comet_key: ''
data:
batch_size: 40
batch_size_test: 10
cates: car
clip_forge_enable: 0
clip_model: ViT-B/32
cond_on_cat: 0
cond_on_voxel: 0
data_dir: data/ShapeNetCore.v2.PC15k
dataset_scale: 1
dataset_type: shapenet15k
eval_test_split: 0
input_dim: -1
is_encode_whole_dataset_trainer: 0
nclass: 55
noise_std: 0.1
noise_std_min: -1.0
noise_type: normal
normalize_global: true
normalize_per_shape: false
normalize_range: false
normalize_shape_box: false
normalize_std_per_axis: false
num_workers: 4
random_subsample: 1
recenter_per_shape: false
sample_with_replacement: 1
te_max_sample_points: 2048
tr_max_sample_points: 2048
train_drop_last: 1
type: datasets.pointflow_datasets
voxel_size: 0.1
ddpm:
add_point_feat: true
attn:

  • 0
  • 1
  • 0
  • 0
    beta_1: 0.0001
    beta_T: 0.02
    clip_denoised: 0
    ddim_step: 200
    dropout: 0.1
    ema: 0
    input_dim: 3
    loss_type: l1_sum
    loss_type_0: ''
    loss_weight_cdnorm: 1.0
    loss_weight_emd: 1.0
    model_mean_type: eps
    model_var_type: fixedlarge
    ncenter:
  • 1024
  • 256
  • 64
  • 16
    num_layers_classifier: 3
    num_steps: 1
    p2_gamma: 1.0
    p2_k: 1.0
    sched_mode: linear
    time_dim: 64
    use_bn: true
    use_global_attn: 0
    use_gn: false
    use_new_timeemb: 0
    use_p2_weight: 0
    with_se: 0
    dpm:
    train_encoder_only: 0
    dpm_ckpt: ''
    eval:
    load_other_vae_ckpt: 0
    need_denoise: 0
    eval_ddim_step: 0
    eval_trainnll: 0
    exp_name: ''
    has_shapelatent: 1
    hash: 3dbf3ah
    latent_pts:
    ada_mlp_init_scale: 0.1
    decoder_layer_out_dim: 32
    encoder_layer_out_dim: 32
    hid: 64
    latent_dim_ext:
  • 64
    mask_out_extra_latent: 0
    normalization: bn
    pts_sigma_offset: 0.0
    pvd_mse_loss: 0
    skip_weight: 0.01
    style_dim: 128
    style_encoder: models.shapelatent_modules.PointNetPlusEncoder
    style_mlp: ''
    style_prior: models.score_sde.resnet.PriorSEDrop
    use_linear_for_adagn: 0
    weight_kl_feat: 1.0
    weight_kl_glb: 1.0
    weight_kl_pt: 1.0
    log_dir: ./exp/0125/car/3dbf3ah_hvae_lion_B40
    log_name: ./exp/0125/car/3dbf3ah_hvae_lion_B40
    model_config: default
    ngpu: 1
    num_ref: 0
    num_val_samples: 16
    save_dir: ./exp/0125/car/3dbf3ah_hvae_lion_B40
    sde:
    attn_mhead: 0
    attn_mhead_local: -1
    autocast_train: false
    beta_end: 20.0
    beta_start: 0.1
    bound_mlogit: 0
    bound_mlogit_value: -5.42
    condition_add: 1
    condition_cat: 0
    cont_kl_anneal: true
    dae_checkpoint: ''
    dataset: shape
    ddim_kappa: 1.0
    ddim_skip_type: uniform
    denoising_stddevs: beta
    diffusion_steps: 1000
    drop_inactive_var: 0
    dropout: 0.2
    ema_decay: 0.9999
    embedding_dim: 128
    embedding_scale: 1.0
    embedding_type: positional
    epochs: 800
    fir: false
    global_prior_ckpt: ''
    grad_clip_max_norm: 0.0
    hier_prior: 0
    hypara_mixing_logit: 0
    init_t: 1.0
    is_continues: 0
    iw_sample_p: ll_iw
    iw_sample_q: reweight_p_samples
    iw_subvp_like_vp_sde: false
    jac_reg_coeff: 0
    jac_reg_freq: 1
    kin_reg_coeff: 0
    kl_anneal_portion_vada: 0.5
    kl_balance_vada: false
    kl_const_coeff_vada: 1.0e-07
    kl_const_portion_vada: 0.0
    kl_max_coeff_vada: 0.5
    learn_mixing_logit: 1
    learning_rate_dae: 0.0003
    learning_rate_dae_local: 0.0003
    learning_rate_min_dae: 0.0003
    learning_rate_min_dae_local: 0.0003
    learning_rate_min_vae: 1.0e-05
    learning_rate_mlogit: -1.0
    learning_rate_vae: 0.0001
    local_prior: same_as_global
    mixed_prediction: false
    mixing_logit_init: -6
    nhead: 4
    num_cell_per_scale_dae: 8
    num_cell_per_scale_dae_local: 0
    num_channels_dae: 256
    num_latent_scales: 1
    num_preprocess_blocks: 2
    num_scales_dae: 2
    ode_eps: 1.0e-05
    ode_sample: 0
    pool_feat_cat: 0
    pos_embed: none
    prior_model: models.latent_points_ada_localprior.PVCNN2Prior
    progressive: none
    progressive_combine: sum
    progressive_input: none
    regularize_mlogit: 0
    regularize_mlogit_margin: 0.0
    sde_type: vpsde
    share_mlogit: 0
    sigma2_0: 0.0
    sigma2_max: 0.99
    sigma2_min: 0.0001
    time_emb_scales: 1.0
    time_eps: 0.01
    train_dae: 1
    train_ode_solver_tol: 1.0e-05
    train_vae: true
    update_q_ema: false
    use_adam: true
    use_adamax: false
    vae_checkpoint: ''
    warmup_epochs: 20
    weight_decay: 0.0003
    weight_decay_norm_dae: 0.0
    weight_decay_norm_vae: 0.0
    set_detect_anomaly: 0
    shapelatent:
    decoder_num_points: 2048
    decoder_type: models.latent_points_ada.LatentPointDecPVC
    encoder_type: models.latent_points_ada.PointTransPVC
    eps_z_global_only: 1
    freeze_vae: 0
    kl_weight: 0.5
    latent_dim: 1
    local_emb_agg: mean
    log_sigma_offset: 6.0
    loss0_weight: 1.0
    model: models.vae_adain
    prior_type: normal
    residual: 1
    snapshot_min: 30
    test_size: 660
    trainer:
    anneal_kl: 1
    apply_loss_weight_1_kl: 0
    epochs: 800
    kl_balance: 0
    kl_free:
  • 0
  • 0
    kl_ratio:
  • 1.0
  • 1.0
    kl_ratio_apply: 0
    loss1_weight_anneal_v: quad
    opt:
    beta1: 0.9
    beta2: 0.99
    ema_decay: 0.9999
    grad_clip: -1.0
    lr: 0.001
    lr_min: 0.0001
    momentum: 0.9
    scheduler: ''
    start_ratio: 0.6
    step_decay: 0.998
    type: adam
    vae_lr_warmup_epochs: 0
    weight_decay: 0.0
    rec_balance: 0
    seed: 1
    sn_reg_vae: 0
    sn_reg_vae_weight: 0.0
    type: trainers.hvae_trainer
    use_grad_scalar: 0
    use_kl_free: 0
    warmup_epochs: 0
    use_checkpoint: 0
    vis_latent_point: 0
    viz:
    log_freq: -1
    save_freq: 2000
    val_freq: 200
    vis_sample_ddim_step: 0
    viz_freq: -400
    viz_order:
  • 2
  • 0
  • 1
    voxel2pts:
    diffusion_steps:
  • 0
    init_weight: ''
    weight_recont: 1.0

@ZENGXH
Copy link
Collaborator

ZENGXH commented Jan 25, 2023

Hi, I try with VAE training using batch-size 40 on 4 gpus: I also get similar NaN issue. However, the same training code works with batch-size 32. It's not clear to me what's the reason, it seems the training does not work with batch-size > 40 somehow.
While I am thinking about this, perhaps you can try using batch-size as 32 for now? Sorry about that!

@Zhiyuan-R
Copy link
Author

Thanks for your hard working! I cannot believe you run it yourself! It is so nice of you! Have a good night!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants