-
Notifications
You must be signed in to change notification settings - Fork 55
Closed
Description
关于cluster.node_num和cluster.gpu_per_node的疑惑:
- cluster.node_num和cluster.gpu_per_node是声明当前任务需要使用的节点数和gpu数?还是申明ray cluster的全局资源数?初步感觉应该是前者?因为后者通过简单调取ray的接口就可以知道。
- 如果是申明的任务需要的资源数,我有一个2nodes,每个node有8张gpu的节点,当我使用grpo算法时,申明cluster.node_num=1,cluster.gpu_per_node=8,explorer.engine_num=1, explorer.tensor_parallel_size = 4。直觉来说,这个任务的exploer和trainer应该被调度到同一个node上(因为cluster.node_num=1),但是实际上各自按照4gpu被分别调度到了不同的node上?这符合你们的设计初衷吗?
附:grpo参数配置:
project: "agent_grpo"
name: "train-1120"
checkpoint_root_dir: /root/checkpoint
algorithm:
algorithm_type: grpo
repeat_times: 8
advantage_fn: grpo
model:
model_path: /root/models/Qwen2.5-7B-Instruct
max_response_tokens: 8192
max_model_len: 131072
cluster:
node_num: 1
gpu_per_node: 8
buffer:
total_epochs: 20480
batch_size: 8
train_batch_size: 18
explorer_input:
taskset:
name: train
storage_type: file
path: '/root/tasks/train.jsonl'
split: train
format:
prompt_key: 'question'
response_key: 'answer'
workflow_args:
max_turns: 10
reward_fn_args:
llm_as_a_judge: true
rollout_args:
temperature: 0.6
enable_progress_bar: true
eval_tasksets:
- name: eval
storage_type: file
path: '/root/tasks/eval.jsonl'
split: test
format:
prompt_key: 'question'
response_key: 'answer'
enable_progress_bar: true
workflow_args:
max_turns: 10
reward_fn_args:
llm_as_a_judge: true
rollout_args:
temperature: 0
default_workflow_type: 'agent_grpo_step'
trainer_input:
experience_buffer:
name: experience_buffer
storage_type: queue
use_priority_queue: true
max_read_timeout: 7200
explorer:
eval_interval: 128
max_repeat_times_per_runner: 1
max_timeout: 3600
runner_per_model: 8
rollout_model:
enable_thinking: true
enable_history: true
enable_openai_api: true
enable_auto_tool_choice: true
tool_call_parser: hermes
engine_num: 1
tensor_parallel_size: 2
enable_prefix_caching: false
enforce_eager: true
dtype: bfloat16
seed: 42
gpu_memory_utilization: 0.9
enable_chunked_prefill: true
synchronizer:
sync_style: dynamic_by_explorer
sync_method: 'nccl'
sync_interval: 16
sync_timeout: 3600
trainer:
save_interval: 64
trainer_config:
trainer:
max_actor_ckpt_to_keep: 5
max_critic_ckpt_to_keep: 5
actor_rollout_ref:
model:
use_remove_padding: true
actor:
use_dynamic_bsz: true
ppo_max_token_len_per_gpu: 50000
ulysses_sequence_parallel_size: 2
entropy_from_logits_with_chunking: true
optim:
lr: 1e-6
ref:
log_prob_use_dynamic_bsz: ${trainer.trainer_config.actor_rollout_ref.actor.use_dynamic_bsz}
log_prob_max_token_len_per_gpu: ${trainer.trainer_config.actor_rollout_ref.actor.ppo_max_token_len_per_gpu}
ulysses_sequence_parallel_size: ${trainer.trainer_config.actor_rollout_ref.actor.ulysses_sequence_parallel_size} # sp sizeReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels