-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable xpu support for meta-reference stack #558
Conversation
Hi @dvrogozh! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
See: meta-llama/llama-stack#558 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
See: meta-llama/llama-stack#558 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
@dvrogozh are you interesting in moving forward with this change? if so, could you add a Test Plan? |
@ashwinb, thank you for taking a look. Yes, I plan to move forward with this PR. The reason it's currently a draft is it's dependency from PR in llama-models: meta-llama/llama-models#233 which should be reviewed and merged first. I am waiting for its review. Can you help with that?
Yes, sure, I will help adding tests covering different devices. If you need anything specific, please, let me know. |
bf43ad2
to
4956907
Compare
@ashwinb : can you, please, give some insights on the existing tests? At the moment there is a lack of documentation regarding tests in llama-stack. Can you suggest if there are existing tests for meta-reference stack which should be extended to work in non-cuda environments? I thought that https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py might provide such a test. However, it fails for me running even in non-modified llama-stack on CUDA system. Questions:
I am getting this when run on-modified llama-stack on CUDA system:
|
Will kill that test, it is stale. The test setup that is relevant for you is in On a CUDA system, you can run it like this:
|
This commit adds support for XPU and CPU devices into meta-reference stack for text models. On creation stack automatically identifies which device to use checking available accelerate capabilities in the following order: CUDA, then XPU, finally CPU. This behaviour can be overwritten with the `DEVICE` environment variable. In this case explicitly specified device will be used. Tested with: ``` torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference ``` Results: * Tested on: system with single CUDA device, system with single XPU device and on pure CPU system * Results: all test pass except `test_completion_logprobs` * `test_completion_logprobs` fails in the same way as on a baseline, i.e. unrelated with this change: `AssertionError: Unexpected top_k=3` Requires: meta-llama/llama-models#233 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is going to be quite useful.
@ashwinb : thank you for pointing out the relevant test. I have rebased the PR on the latest main. The test you pointed to works for non-CUDA device without modifications since at the moment I added automated device selection on the model creation. FYI, I observe the following test failure on the main code without applying this PR:
There seems to be a corresponding enhancement request: |
@dvrogozh yes the implementation only supports top_k but the test is using top_k=3, will fix |
This commit adds support for XPU and CPU devices into meta-reference stack for text models. On creation stack automatically identifies which device to use checking available accelerate capabilities in the following order: CUDA, then XPU, finally CPU. This behaviour can be overwritten with the `DEVICE` environment variable. In this case explicitly specified device will be used. Tested with: ``` torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference ``` Results: * Tested on: system with single CUDA device, system with single XPU device and on pure CPU system * Results: all test pass except `test_completion_logprobs` * `test_completion_logprobs` fails in the same way as on a baseline, i.e. unrelated with this change: `AssertionError: Unexpected top_k=3` Requires: meta-llama/llama-models#233 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
This commit adds support for XPU and CPU devices into meta-reference stack for text models. On creation stack automatically identifies which device to use checking available accelerate capabilities in the following order: CUDA, then XPU, finally CPU. This behaviour can be overwritten with the
DEVICE
environment variable. In this case explicitly specified device will be used.Tested with:
Results:
test_completion_logprobs
test_completion_logprobs
fails in the same way as on a baseline, i.e. unrelated with this change:AssertionError: Unexpected top_k=3
Requires: meta-llama/llama-models#233