-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(deps): upgrade trl and transformers #448
Conversation
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Thanks for making a pull request! 😃 |
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
pyproject.toml
Outdated
@@ -33,7 +33,7 @@ dependencies = [ | |||
"sentencepiece>=0.1.99,<0.3", | |||
"tokenizers>=0.13.3,<1.0", | |||
"tqdm>=4.66.2,<5.0", | |||
"trl>=0.9.3,<0.12", | |||
"trl>=0.9.3,<1.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willmj do we want to probably lock the version to higher than one which supports packing for pretokenized datasets which we enabled along with this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dushyantbehl I thought the latest version was 0.13.0, so this would include that. Feel free to make a suggested change if I'm misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"trl>=0.9.3,<1.0", | |
"trl>=0.13.0,<1.0", |
@dushyantbehl is this your suggestion?
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
I think if we want TRL version to be greater than |
Upgrade of transformers >=4.46 is dependent on fms-acceleration #98, also have to update SFTConfig correctly |
Correct @Abhishek-TAMU that's what I wanted to highlight too thanks |
I see....good to note both here and we can update this later post we get working our repo with the new transformers..thanks |
@anhuong thanks for quick fix on foundation-model-stack/fms-acceleration#123 Does this mean we can go ahead and move forward with this change? cc @willmj @fabianlim |
As both |
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Dushyant Behl <dushyantbehl@in.ibm.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
All the failed tests (41 failed tests) would be completely resolved by these 3 changes: 1- Setting As the size of training dataset is 10 and batch_size is 4, hence due to 2- Same reasoning, to change 3- Use _get_checkpoint_path to get last checkpoint instead of
|
Co-authored-by: Abhishek <maurya.abhishek@ibm.com> Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
New release of fms-acceleration unblocking transformer 4.46. Should we go for merging? |
@willmj sure lets go ahead...do you want to merge the change which enables pretokenized datasets with packing as well? or we keep it separate? |
@willmj I applied the fixes to test cases suggested by @Abhishek-TAMU above and have opened a PR on reverting the packing for pretokenized dataset restirctions #468 |
Running a few tests of tuning with the updated versions, but the changes look good to me. Once my tests run, I will note in this PR |
Full fine-tuning and LoRA tuning ran fine on minimal example. We will have to test to see how these changes affect train_runtime and quality in the next release though @Abhishek-TAMU |
Hmm but the changes from main broke unit tests so will investigate... |
Signed-off-by: Anh Uong <anh.uong@ibm.com>
New version of trl v0.15.0 was released 3 hours ago and there are not yet release docs. This upgrade broke the unit tests, setting upper limit to below 0.15.0 for now. Noting the error here: failed in training loop when running trainer.train() 0%| | 0/15 [00:00<?, ?it/s]ERROR:sft_trainer.py:Traceback (most recent call last):
File "/home/runner/work/fms-hf-tuning/fms-hf-tuning/tuning/sft_trainer.py", line 676, in main
trainer, additional_train_info = train(
^^^^^^
File "/home/runner/work/fms-hf-tuning/fms-hf-tuning/tuning/sft_trainer.py", line 420, in train
trainer.train(resume_from_checkpoint)
File "/home/runner/work/fms-hf-tuning/fms-hf-tuning/.tox/coverage/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/fms-hf-tuning/fms-hf-tuning/.tox/coverage/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/fms-hf-tuning/fms-hf-tuning/.tox/coverage/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/fms-hf-tuning/fms-hf-tuning/.tox/coverage/lib/python3.12/site-packages/trl/trainer/sft_trainer.py", line 453, in compute_loss
accuracy = compute_token_accuracy(shift_logits, shift_labels)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/fms-hf-tuning/fms-hf-tuning/.tox/coverage/lib/python3.12/site-packages/trl/trainer/utils.py", line 1664, in compute_token_accuracy
correct_predictions = (predictions == labels) & mask
^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The size of tensor a (72) must match the size of tensor b (64) at non-singleton dimension 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description of the change
requirements.txt in trl v0.13.0 no longer restricts transformers version.
Related issue number
More context
How to verify the PR
Was the PR tested