Support Agent GRPO #3170

tastelikefeet · 2025-02-18T14:45:31Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

…m_ds * commit '0d3270d5b356a16853a43653cfd54d522445e281': Support Agent GRPO (modelscope#3170) Fix ovis2 (modelscope#3169) support grpo metric_for_best_model (modelscope#3155) Support Ovis2 models (modelscope#3163) docs: report_to add swanlab (modelscope#3158) # Conflicts: # examples/train/grpo/multi_node/multi_gpu_agent.sh # swift/plugin/orm.py

…soth_fast_grpo * commit '8921d9b98310d93f9f111af8859358ee32dce687': (46 commits) Support multiple vllms (modelscope#3202) update dataset & fix bugs (modelscope#3203) support vllm dp (modelscope#3201) fix setup.py (modelscope#3198) add links (modelscope#3193) Refactor grpo dataset (modelscope#3192) support r1 dataset (modelscope#3191) compat vllm==0.7.2 (modelscope#3083) support Knowledge Distillation sampling (modelscope#3185) Support GOT_OCR2_hf (modelscope#3182) Fix prm in sampler (modelscope#3184) fix sampler reaches max_length (modelscope#3180) refactor cosine orm (modelscope#3179) fix internvl-4b (modelscope#3178) Fix lmdeploy branch (modelscope#3145) Fix/agent grpo (modelscope#3172) fix streaming (modelscope#3176) fix max_length error (modelscope#3173) Support Agent GRPO (modelscope#3170) Fix ovis2 (modelscope#3169) ... # Conflicts: # swift/llm/train/tuner.py

tastelikefeet added 8 commits February 18, 2025 16:05

add new ds

0e0cd61

wip

5ab87cf

fix

8be7711

lint code

8c1e18d

fix

7396c0d

add more scripts

255b7cf

lint

09b3cf6

fix

91b2f6f

Jintao-Huang approved these changes Feb 18, 2025

View reviewed changes

tastelikefeet added 2 commits February 18, 2025 23:13

fix

4ebde92

fix

741ff42

tastelikefeet merged commit 0d3270d into modelscope:main Feb 18, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Agent GRPO #3170

Support Agent GRPO #3170

tastelikefeet commented Feb 18, 2025

Support Agent GRPO #3170

Support Agent GRPO #3170

Conversation

tastelikefeet commented Feb 18, 2025

PR type

PR information

Experiment results