Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: RL base feature parity #2133

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Fix: RL base feature parity #2133

wants to merge 17 commits into from

Conversation

NanoCode012
Copy link
Collaborator

@NanoCode012 NanoCode012 commented Dec 6, 2024

Description

RL trainer was not loading some basic configs like logging_steps etc. This PR consolidates the setting of these params and cleans them up.

This PR also fixes a case where we did not call .map with num_proc.

To discuss:

  • Handling of bf16/bfloat16
  • Handling of fp16 in RL
  • Handling of tf32 in RL
  • Default of use_reentrant
  • Default of remove_unused_columns

Addresses a lot of points in #2105

Motivation and Context

How has this been tested?

Untested!

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

@NanoCode012 NanoCode012 force-pushed the fix/orpo_feature_parity branch 2 times, most recently from 0719188 to 4b8f65b Compare February 3, 2025 10:44
@NanoCode012 NanoCode012 force-pushed the fix/orpo_feature_parity branch from 1b15a11 to 62d04e4 Compare February 14, 2025 13:10
@NanoCode012 NanoCode012 marked this pull request as ready for review February 14, 2025 13:21
@bursteratom bursteratom force-pushed the fix/orpo_feature_parity branch from 65a83b7 to 93a2ecc Compare February 18, 2025 04:13
Copy link
Collaborator

@bursteratom bursteratom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@winglian
Copy link
Collaborator

Thanks @NanoCode012 ! This should be good to go once the multi gpu tests pass too https://github.com/axolotl-ai-cloud/axolotl/actions/runs/13457763772

@winglian winglian force-pushed the fix/orpo_feature_parity branch from fc04dcf to 4321607 Compare February 22, 2025 03:08
@winglian
Copy link
Collaborator

@winglian
Copy link
Collaborator

Comment on lines 234 to 252
warmup_steps = max(int(self.cfg.warmup_ratio * total_num_steps), 0)
else:
warmup_steps = min(int(0.03 * total_num_steps), 100)
if warmup_steps == 1:
warmup_steps = 2

logging_steps = (
self.cfg.logging_steps
if self.cfg.logging_steps is not None
else max(min(int(0.005 * total_num_steps), 10), 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for many of the RL trainers, we can't rely on total_num_steps (set to None) and we pass -1 as max_steps to let the trainer figure out the total number of steps

Comment on lines +237 to +245
if warmup_steps == 1:
warmup_steps = 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winglian winglian force-pushed the fix/orpo_feature_parity branch from 4321607 to 8331312 Compare February 23, 2025 17:42
@winglian
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants