Refactored convergence tests to be portable #41

shimizust · 2024-08-17T06:11:43Z

Summary

Make convergence tests more portable and easier to run by using pre-tokenized data. This removes internal paths and allows users to not have to download specific model tokenizers in a certain location.
Since we're just testing convergence on a mini model with random weights, the specific tokenizer doesn't really matter.
Convergence tests also finish faster: ~95 sec -> ~60 sec

Alternatives:

Provide users the ability to configure paths to the different models used in convergence test or HF token to download the tokenizer (inconvenient to configure/download things run tests, not portable, different tokenizer versions could break tests)
Save the tokenizers in the repo (licensing issues)
Save a small, completely OS tokenizer in the repo and use across all tests (could do this, but also more performant to just pre-tokenize the data)

Testing Done

Ran convergence tests successfully

run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

jobuser [ ~/Liger-Kernel ]$ make checkstyle
flake8 .; flake8_status=$?; \
isort .; isort_status=$?; \
black .; black_status=$?; \
if [ $flake8_status -ne 0 ] || [ $isort_status -ne 0 ] || [ $black_status -ne 0 ]; then \
        exit 1; \
fi
Fixing /home/jobuser/Liger-Kernel/test/convergence/test_mini_models.py
Skipped 1 files
All done! ✨ 🍰 ✨
45 files left unchanged.
jobuser [ ~/Liger-Kernel ]$ make test
pytest --disable-warnings test/ --ignore=test/convergence
===================================================================================================================== test session starts ======================================================================================================================
platform linux -- Python 3.10.14, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jobuser/Liger-Kernel
plugins: lipy-config-base-30.6.1, lipy-fabric-35.2.3, lipy-test-8.0.52, datadir-1.3.1, lipy-mp-34.4.191
collected 111 items                                                                                                                                                                                                                                            

test/transformers/test_cross_entropy.py ..........................................................                                                                                                                                                       [ 52%]
test/transformers/test_fused_linear_cross_entropy.py ......                                                                                                                                                                                              [ 57%]
test/transformers/test_geglu.py ........                                                                                                                                                                                                                 [ 64%]
test/transformers/test_rms_norm.py ................                                                                                                                                                                                                      [ 79%]
test/transformers/test_rope.py ............                                                                                                                                                                                                              [ 90%]
test/transformers/test_swiglu.py ........                                                                                                                                                                                                                [ 97%]
test/transformers/test_transformers_monkey_patch.py .                                                                                                                                                                                                    [ 98%]
test/triton/test_triton_monkey_patch.py ..                                                                                                                                                                                                               [100%]

================================================================================================================ 111 passed in 60.81s (0:01:00) ================================================================================================================
jobuser [ ~/Liger-Kernel ]$ make test-convergence
HF_DATASETS_OFFLINE=1 pytest --disable-warnings test/convergence
===================================================================================================================== test session starts ======================================================================================================================
platform linux -- Python 3.10.14, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jobuser/Liger-Kernel
plugins: lipy-config-base-30.6.1, lipy-fabric-35.2.3, lipy-test-8.0.52, datadir-1.3.1, lipy-mp-34.4.191
collected 8 items                                                                                                                                                                                                                                              

test/convergence/test_mini_models.py ......                                                                                                                                                                                                              [ 75%]
test/convergence/test_mini_models_no_logits.py ..                                                                                                                                                                                                        [100%]

====================================================================================================================== 8 passed in 58.41s ======================================================================================================================

lancerts · 2024-08-17T13:47:24Z

Great work!! Can we paste the testing screenshot in the PR as #21? Thanks

test/convergence/test_mini_models.py

lancerts · 2024-08-17T13:49:52Z

test/convergence/test_mini_models.py

@@ -210,7 +172,7 @@ def run_mini_model(
        ("mini_llama3", 32, 1e-4, torch.float32, 1e-8, 1e-5, 1e-4, 1e-5, 2e-3, 1e-5),
        ("mini_llama3", 32, 1e-4, torch.bfloat16, 1e-8, 1e-5, 1e-1, 1e-5, 1e-2, 1e-5),
        # TODO: torch 2.5.0 nightly breaks mixtral test, but torch 2.3.0 works fine
-        ("mini_mixtral", 32, 1e-4, torch.float32, 1e-8, 1e-5, 1e-3, 1e-5, 8e-3, 1e-5),
+        ("mini_mixtral", 32, 1e-4, torch.float32, 1e-8, 1e-4, 1e-3, 3e-2, 8e-3, 1e-5),


why do we need to relax the bound? test failed? 1e-5 -> 3e-2 seems too much?

Yes, the test failed for the previous tolerances. I'm not sure how to account for this--we should probably investigate more the effect of the dataset and other parameters on the expected tolerances. Thoughts @ByronHsu ?

can we try
("mini_mixtral", 32, 1e-4, torch.float32, 1e-8, 1e-4, 2e-3, 1e-5, 8e-3, 1e-5),
?

@lancerts Loss had a few errors:

> raise AssertionError("\n".join(mismatch_details)) E AssertionError: Number of mismatched elements: 4 E Mismatch at index (0, 23): tensor1[(0, 23)] = 0.46933501958847046, tensor2[(0, 23)] = 0.4692351222038269 E Mismatch at index (0, 24): tensor1[(0, 24)] = 0.4860617518424988, tensor2[(0, 24)] = 0.48613235354423523 E Mismatch at index (0, 25): tensor1[(0, 25)] = 0.43753352761268616, tensor2[(0, 25)] = 0.4377014636993408 E Mismatch at index (0, 26): tensor1[(0, 26)] = 0.36302775144577026, tensor2[(0, 26)] = 0.3631027042865753 test/utils.py:83: AssertionError

This works: ("mini_mixtral", 32, 1e-4, torch.float32, 1e-8, 1e-4, 5e-3, 1e-5, 8e-3, 1e-5),

cool, lets use ("mini_mixtral", 32, 1e-4, torch.float32, 1e-8, 1e-4, 5e-3, 1e-5, 8e-3, 1e-5).

I wonder if the test tolerance should be refactored to use a single value instead of 2 degrees of freedom, or like keep the absolute tolerance fixed, and tests just define the relative tolerance

test/convergence/test_mini_models_no_logits.py

lancerts · 2024-08-17T13:50:29Z

test/convergence/test_mini_models_no_logits.py

@@ -145,7 +105,7 @@ def run_mini_model(
 @pytest.mark.parametrize(
    "model_name, num_steps, lr, dtype, loss_atol, loss_rtol, logits_atol, logits_rtol, param_atol, param_rtol",
    [
-        ("mini_llama3", 32, 1e-4, torch.float32, 1e-8, 1e-5, 1e-4, 1e-5, 5e-3, 1e-5),
+        ("mini_llama3", 32, 1e-4, torch.float32, 1e-8, 2e-5, 1e-4, 1e-5, 5e-3, 1e-5),


same comment as above

Yes, test failed with the previous tolerance

test/resources/tiny_shakespeare_tokenized/dataset_info.json

ByronHsu · 2024-08-17T16:55:38Z

This looks awesome!! Can we also include the code for generating the tokenized dataset? name it as generate_tokenized_dataset.py

ByronHsu · 2024-08-19T02:21:11Z

Let's ensure this is in before we go public!

shimizust · 2024-08-19T06:09:05Z

This looks awesome!! Can we also include the code for generating the tokenized dataset? name it as generate_tokenized_dataset.py

Thanks, added the generation script

shimizust added 4 commits August 16, 2024 21:01

Initial modification of tests to remove tokenizer dependency

81c7fd1

One test failing

f590402

Fixed tolerances, checkstyle

8ed9c69

Cleanup

07298d3

shimizust requested review from ByronHsu and yundai424 August 17, 2024 06:12

lancerts reviewed Aug 17, 2024

View reviewed changes

test/resources/tiny_shakespeare_tokenized/dataset_info.json Show resolved Hide resolved

ByronHsu added the p0 label Aug 19, 2024

shimizust added 2 commits August 19, 2024 05:30

Added generate_tokenized_dataset script

5f03a7c

Addressed review comments

6a47c5c

Updated tolerances

2564662

lancerts approved these changes Aug 19, 2024

View reviewed changes

lancerts merged commit 8ce3b53 into main Aug 19, 2024
1 check passed

ByronHsu deleted the sshimizu/test-tokenizer-refactor branch August 23, 2024 06:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored convergence tests to be portable #41

Refactored convergence tests to be portable #41

shimizust commented Aug 17, 2024 •

edited

Loading

lancerts commented Aug 17, 2024 •

edited

Loading

lancerts Aug 17, 2024

shimizust Aug 19, 2024

lancerts Aug 19, 2024

shimizust Aug 19, 2024

shimizust Aug 19, 2024

lancerts Aug 19, 2024

shimizust Aug 19, 2024

lancerts Aug 17, 2024

shimizust Aug 19, 2024

ByronHsu commented Aug 17, 2024

ByronHsu commented Aug 19, 2024

shimizust commented Aug 19, 2024

Refactored convergence tests to be portable #41

Refactored convergence tests to be portable #41

Conversation

shimizust commented Aug 17, 2024 • edited Loading

Summary

Testing Done

lancerts commented Aug 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ByronHsu commented Aug 17, 2024

ByronHsu commented Aug 19, 2024

shimizust commented Aug 19, 2024

shimizust commented Aug 17, 2024 •

edited

Loading

lancerts commented Aug 17, 2024 •

edited

Loading