-
Notifications
You must be signed in to change notification settings - Fork 55
Description
问题描述:example中的mix_chord,运行trinity run时,world_size计算为0,导致 % 0 错误
环境:Python 3.12, CUDA 12.8, RTX 5090 x2, autodl容器, Ray status显示GPU:2.0, torch.cuda.device_count()=2
报错信息如下:
WARNING 08-26 15:06:05 [config.py:825] max_prompt_tokens is set to 11263.
INFO 08-26 15:06:05 [config.py:555] buffer.explorer_input.taskset.repeat_times is set to algorithm.repeat_times (=8).
INFO 08-26 15:06:05 [config.py:679] Auto set data_processor.experience_pipeline.input_save_path to /root/Trinity-RFT/examples/mix_chord/mix_chord/test_mix_chord/buffer/explorer_output.jsonl
Traceback (most recent call last):
File "/root/miniconda3/bin/trinity", line 8, in
sys.exit(main())
^^^^^^
File "/root/Trinity-RFT/trinity/cli/launcher.py", line 219, in main
run(args.config, getattr(args, 'log_level', 'INFO'), args.dlc, args.plugin_dir)
File "/root/Trinity-RFT/trinity/cli/launcher.py", line 127, in run
config.check_and_update()
File "/root/Trinity-RFT/trinity/common/config.py", line 898, in check_and_update
self.trainer.trainer_config.synchronize_config(self)
File "/root/Trinity-RFT/trinity/common/verl_config.py", line 302, in synchronize_config
if config.buffer.train_batch_size % world_size != 0:
ZeroDivisionError: integer modulo by zero