-
-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: RL base feature parity #2133
base: main
Are you sure you want to change the base?
Conversation
0719188
to
4b8f65b
Compare
1b15a11
to
62d04e4
Compare
65a83b7
to
93a2ecc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @NanoCode012 ! This should be good to go once the multi gpu tests pass too https://github.com/axolotl-ai-cloud/axolotl/actions/runs/13457763772 |
fc04dcf
to
4321607
Compare
looks like the multi-gpu GRPO tests are failing https://github.com/axolotl-ai-cloud/axolotl/actions/runs/13469023440/job/37640091555 |
src/axolotl/core/trainer_builder.py
Outdated
warmup_steps = max(int(self.cfg.warmup_ratio * total_num_steps), 0) | ||
else: | ||
warmup_steps = min(int(0.03 * total_num_steps), 100) | ||
if warmup_steps == 1: | ||
warmup_steps = 2 | ||
|
||
logging_steps = ( | ||
self.cfg.logging_steps | ||
if self.cfg.logging_steps is not None | ||
else max(min(int(0.005 * total_num_steps), 10), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for many of the RL trainers, we can't rely on total_num_steps (set to None) and we pass -1 as max_steps to let the trainer figure out the total number of steps
if warmup_steps == 1: | ||
warmup_steps = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4321607
to
8331312
Compare
Description
RL trainer was not loading some basic configs like
logging_steps
etc. This PR consolidates the setting of these params and cleans them up.This PR also fixes a case where we did not call
.map
withnum_proc
.To discuss:
use_reentrant
remove_unused_columns
Addresses a lot of points in #2105
Motivation and Context
How has this been tested?
Untested!Screenshots (if appropriate)
Types of changes
Social Handles (Optional)