-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Enable Intel XPU support #839
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your great efforts and enabling TRL for intel XPU devices!
My question is about backward compatbility with this PR, it seems you have removed the optimize_cuda_cache
- that makes the library not BC with users that use that argument. Would it be possible to add a new attribute in PPOConfig optimize_device_cache
that is going to be initialized with the same value as optimize_cuda_cache
and raise a warning for users telling them that optimize_cuda_cache
is going to be deprecated in the future?
Thanks @younesbelkada for the suggestion.Could you help re trigger CI and re-review? Thanks |
@younesbelkada could you help review and rerigger CI ? I see the previous CI failure was arising due to mps . I tested the tests locally , seems to pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again! I would like @lvwerra to have a second look if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me. Have you had a chance to run the sentiment script with XPU
to make sure everything works? Would you mind sharing the learning curves so we can compare to GPU runs?
Yes we are running on all example scripts on our devices. I can share the loss curves for different tasks here ? Some of the perf is internal but we can share the functionality aspects . |
Could you for example run |
* Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com>
@lvwerra @younesbelkada I tried the dpo.py for sanity check on our GPUs for training , the wandb log: https://wandb.ai/abhilash-majumder/huggingface/runs/chj3jbgf?workspace=user-abhilash-majumder |
THanks very much for running the experiments @abhilash1910 ! Looking at the logs it seems the wandb is private :/ can you make the logs public so that we can have a look at them ? 🙏 Thanks! |
Oh apologies, it is public now, could you let me know if you are able to check? Thanks @younesbelkada |
@younesbelkada could you please help re-trigger CI? thanks. |
@younesbelkada @lvwerra if the changes make sense and internal slow tests pass, can this be merged ? Also is there a slack channel where I can further follow up regarding the benchmarking perf stats from our devices (if required) ? Thanks |
Hi @abhilash1910, the PR is in a good shape to be merged IMO. If you could run |
Hi @lvwerra, there seems to be no "sentiment_tuning.py" script but I think there is "ppo.py" script which I tried out. Just for testing purposes: https://wandb.ai/abhilash-majumder/trl/runs/k0kletcb/overview?workspace=user-abhilash-majumder (let me know if you can access). It contains the stdout(prints) as I was making sure all results were correct. I am in process of taking a longer run. Thanks! |
Oh yes, that's it - we recently renamed it. Looks like your W&B logs are private though :) |
Oh I thought it was public, it is now. Could you also re-trigger the CI ? Thanks |
@lvwerra @younesbelkada could you help re-trigger CI and re-review? If any further modules need to be supported from our side, it will be in a separate PR . Thanks |
Could you merge |
Looks good, if you can share the longer run, I think we can merge :) |
@lvwerra completed the ppo script with 194 iterations; I am sharing the logs here: |
Hi @abhilash1910, did you still save the training logs? All we want to check is that PPO also converges at the same rate as on the GPU. E.g. the mean reward vs. iteration plot would be helpful. Thank you! |
@lvwerra Unfortunately I did not save the entire run log for wandb (for the previous run), I retook a snap again for 6 iters for the ppo mean & the returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks!
Thanks @lvwerra , @younesbelkada for your continued support. |
* enable xpu support * fix bug * review commits * fix style * add xou decorator * refactor review commit * fix test * review commit * fix test * Update benchmark.yml (huggingface#856) * Standardise example scripts (huggingface#842) * Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> * Fix version check in import_utils.py (huggingface#853) * dont use get_peft_model if model is already peft (huggingface#857) * merge conflict * add xou decorator * resolve * resolves * upstream * refactor and precommit * fix new tests * add device mapping for xpu --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Adam Pauls <adpauls@gmail.com> Co-authored-by: abhishek thakur <1183441+abhishekkrthakur@users.noreply.github.com>
Thanks for creating this library. In our effort to streamline huggingface on Intel devices, this PR is an important step . With support already enabled in peft/accelerate and transformers this would be a good addition to run TRL out of the box on our devices.
cc @younesbelkada @pacman100