Fix: RL base feature parity #2133

NanoCode012 · 2024-12-06T09:45:26Z

Description

RL trainer was not loading some basic configs like logging_steps etc. This PR consolidates the setting of these params and cleans them up.

This PR also fixes a case where we did not call .map with num_proc.

To discuss:

Addresses a lot of points in #2105

Motivation and Context

How has this been tested?

~~Untested!~~

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

bursteratom

LGTM

winglian · 2025-02-21T13:25:29Z

Thanks @NanoCode012 ! This should be good to go once the multi gpu tests pass too https://github.com/axolotl-ai-cloud/axolotl/actions/runs/13457763772

winglian · 2025-02-22T03:09:42Z

latest ci: https://github.com/axolotl-ai-cloud/axolotl/actions/runs/13469023440

winglian · 2025-02-22T03:23:37Z

looks like the multi-gpu GRPO tests are failing https://github.com/axolotl-ai-cloud/axolotl/actions/runs/13469023440/job/37640091555

winglian · 2025-02-22T03:52:08Z

src/axolotl/core/trainer_builder.py

+            warmup_steps = max(int(self.cfg.warmup_ratio * total_num_steps), 0)
+        else:
+            warmup_steps = min(int(0.03 * total_num_steps), 100)
+        if warmup_steps == 1:
+            warmup_steps = 2
+
+        logging_steps = (
+            self.cfg.logging_steps
+            if self.cfg.logging_steps is not None
+            else max(min(int(0.005 * total_num_steps), 10), 1)


for many of the RL trainers, we can't rely on total_num_steps (set to None) and we pass -1 as max_steps to let the trainer figure out the total number of steps

winglian · 2025-02-23T17:34:32Z

src/axolotl/core/trainer_builder.py

+        if warmup_steps == 1:
+            warmup_steps = 2


what is this for?

This was in the original SFT trainer https://github.com/axolotl-ai-cloud/axolotl/pull/2133/files#diff-5edc13801ecfdd108e81872527bdc78c6d24a73833968147b0a9ecb8452996f4L323-L324

winglian · 2025-02-23T18:20:32Z

https://github.com/axolotl-ai-cloud/axolotl/actions/runs/13485689047

NanoCode012 force-pushed the fix/orpo_feature_parity branch 2 times, most recently from 0719188 to 4b8f65b Compare February 3, 2025 10:44

NanoCode012 force-pushed the fix/orpo_feature_parity branch from 1b15a11 to 62d04e4 Compare February 14, 2025 13:10

NanoCode012 marked this pull request as ready for review February 14, 2025 13:21

NanoCode012 requested a review from bursteratom February 14, 2025 15:10

bursteratom force-pushed the fix/orpo_feature_parity branch from 65a83b7 to 93a2ecc Compare February 18, 2025 04:13

bursteratom approved these changes Feb 18, 2025

View reviewed changes

winglian approved these changes Feb 21, 2025

View reviewed changes

winglian force-pushed the fix/orpo_feature_parity branch from fc04dcf to 4321607 Compare February 22, 2025 03:08

winglian reviewed Feb 22, 2025

View reviewed changes

winglian reviewed Feb 23, 2025

View reviewed changes

NanoCode012 and others added 15 commits February 23, 2025 12:41

feat: add num_proc and load from cache for rl mapping

9891db1

fix: refactor sft and rl trainer to set same base args

3af8cad

feat: add report_to to set run name

cf98f87

fix: consolidate handling of fp16, bf16, tf32 kwarg

666504e

chore: consolidate eval_strat, loraplus, lr sched, max_length

582e962

fix: deprecate old types

535ba73

fix: adding missing Any

66dd5af

fix: max_steps incorrectly set

f83310a

fix: remove unnecessary datacollator kwarg insert and pop

121c014

fix: update default max_steps

1272278

fix: add missing weight_decay handling

2c869fe

fix: ignore max_length for grpo

3c9cf2b

feat: update CI on trainer_builder

e5793df

fix: comments

8b349b3

improve handling of warmup/logging steps

8331312

winglian force-pushed the fix/orpo_feature_parity branch from 4321607 to 8331312 Compare February 23, 2025 17:42

winglian and others added 2 commits February 23, 2025 17:29

use transformers default for logging steps, not None

5d23e5e

fix: remove redundant override

8143e16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: RL base feature parity #2133

Fix: RL base feature parity #2133

NanoCode012 commented Dec 6, 2024 •

edited

Loading

bursteratom left a comment

winglian commented Feb 21, 2025

winglian commented Feb 22, 2025

winglian commented Feb 22, 2025

winglian Feb 22, 2025

winglian Feb 23, 2025

NanoCode012 Feb 24, 2025

winglian commented Feb 23, 2025

Fix: RL base feature parity #2133

Are you sure you want to change the base?

Fix: RL base feature parity #2133

Conversation

NanoCode012 commented Dec 6, 2024 • edited Loading

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

bursteratom left a comment

Choose a reason for hiding this comment

winglian commented Feb 21, 2025

winglian commented Feb 22, 2025

winglian commented Feb 22, 2025

winglian Feb 22, 2025

Choose a reason for hiding this comment

winglian Feb 23, 2025

Choose a reason for hiding this comment

NanoCode012 Feb 24, 2025

Choose a reason for hiding this comment

winglian commented Feb 23, 2025

NanoCode012 commented Dec 6, 2024 •

edited

Loading