By default, the training script takes about 60-70G GPU memory under the batch size of 2 (each GPU). Training is performed on 24 NVIDIA A800 GPUs with 80 GB memory over 41 hours, for 1000 epochs
CUDA_VISIBLE_DEVICES=0,1,2,3 python run_training.py \
py_func=train +training=train_planTF \
worker=single_machine_thread_pool worker.max_workers=32 \
scenario_builder=nuplan cache.cache_path=/nuplan/exp/cache_plantf_1M cache.use_cache_without_dataset=true \
data_loader.params.batch_size=32 data_loader.params.num_workers=32 \
lr=1e-3 epochs=25 warmup_epochs=3 weight_decay=0.0001 \
lightning.trainer.params.val_check_interval=0.5 \
wandb.mode=online wandb.project=nuplan wandb.name=plantf
By default, the training script takes about 60-70G GPU memory under the batch size of 2 (each GPU). Training is performed on 8 NVIDIA A800 GPUs with 80 GB memory over 148 hours, for 20 epochs
CUDA_VISIBLE_DEVICES=0,1,2,3 python run_training.py \
py_func=train +training=train_planTF \
worker=single_machine_thread_pool worker.max_workers=32 \
scenario_builder=nuplan cache.cache_path=/nuplan/exp/cache_plantf_1M cache.use_cache_without_dataset=true \
data_loader.params.batch_size=32 data_loader.params.num_workers=32 \
lr=1e-3 epochs=25 warmup_epochs=3 weight_decay=0.0001 \
lightning.trainer.params.val_check_interval=0.5 \
wandb.mode=online wandb.project=nuplan wandb.name=plantf