-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored convergence tests to be portable #41
Conversation
Great work!! Can we paste the testing screenshot in the PR as #21? Thanks |
test/convergence/test_mini_models.py
Outdated
@@ -210,7 +172,7 @@ def run_mini_model( | |||
("mini_llama3", 32, 1e-4, torch.float32, 1e-8, 1e-5, 1e-4, 1e-5, 2e-3, 1e-5), | |||
("mini_llama3", 32, 1e-4, torch.bfloat16, 1e-8, 1e-5, 1e-1, 1e-5, 1e-2, 1e-5), | |||
# TODO: torch 2.5.0 nightly breaks mixtral test, but torch 2.3.0 works fine | |||
("mini_mixtral", 32, 1e-4, torch.float32, 1e-8, 1e-5, 1e-3, 1e-5, 8e-3, 1e-5), | |||
("mini_mixtral", 32, 1e-4, torch.float32, 1e-8, 1e-4, 1e-3, 3e-2, 8e-3, 1e-5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to relax the bound? test failed? 1e-5 -> 3e-2 seems too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the test failed for the previous tolerances. I'm not sure how to account for this--we should probably investigate more the effect of the dataset and other parameters on the expected tolerances. Thoughts @ByronHsu ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we try
("mini_mixtral", 32, 1e-4, torch.float32, 1e-8, 1e-4, 2e-3, 1e-5, 8e-3, 1e-5),
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lancerts Loss had a few errors:
> raise AssertionError("\n".join(mismatch_details))
E AssertionError: Number of mismatched elements: 4
E Mismatch at index (0, 23): tensor1[(0, 23)] = 0.46933501958847046, tensor2[(0, 23)] = 0.4692351222038269
E Mismatch at index (0, 24): tensor1[(0, 24)] = 0.4860617518424988, tensor2[(0, 24)] = 0.48613235354423523
E Mismatch at index (0, 25): tensor1[(0, 25)] = 0.43753352761268616, tensor2[(0, 25)] = 0.4377014636993408
E Mismatch at index (0, 26): tensor1[(0, 26)] = 0.36302775144577026, tensor2[(0, 26)] = 0.3631027042865753
test/utils.py:83: AssertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works: ("mini_mixtral", 32, 1e-4, torch.float32, 1e-8, 1e-4, 5e-3, 1e-5, 8e-3, 1e-5),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, lets use ("mini_mixtral", 32, 1e-4, torch.float32, 1e-8, 1e-4, 5e-3, 1e-5, 8e-3, 1e-5)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the test tolerance should be refactored to use a single value instead of 2 degrees of freedom, or like keep the absolute tolerance fixed, and tests just define the relative tolerance
@@ -145,7 +105,7 @@ def run_mini_model( | |||
@pytest.mark.parametrize( | |||
"model_name, num_steps, lr, dtype, loss_atol, loss_rtol, logits_atol, logits_rtol, param_atol, param_rtol", | |||
[ | |||
("mini_llama3", 32, 1e-4, torch.float32, 1e-8, 1e-5, 1e-4, 1e-5, 5e-3, 1e-5), | |||
("mini_llama3", 32, 1e-4, torch.float32, 1e-8, 2e-5, 1e-4, 1e-5, 5e-3, 1e-5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, test failed with the previous tolerance
This looks awesome!! Can we also include the code for generating the tokenized dataset? name it as |
Let's ensure this is in before we go public! |
Thanks, added the generation script |
Summary
Alternatives:
Testing Done
Ran convergence tests successfully
make test
to ensure correctnessmake checkstyle
to ensure code stylemake test-convergence
to ensure convergence