feat: enable xpu support for meta-reference stack #558

dvrogozh · 2024-12-02T22:06:03Z

This commit adds support for XPU and CPU devices into meta-reference stack for text models. On creation stack automatically identifies which device to use checking available accelerate capabilities in the following order: CUDA, then XPU, finally CPU. This behaviour can be overwritten with the DEVICE environment variable. In this case explicitly specified device will be used.

Tested with:

torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference

Results:

Tested on: system with single CUDA device, system with single XPU device and on pure CPU system
Results: all test pass except test_completion_logprobs
test_completion_logprobs fails in the same way as on a baseline, i.e. unrelated with this change: AssertionError: Unexpected top_k=3

Requires: meta-llama/llama-models#233

facebook-github-bot · 2024-12-02T22:06:10Z

Hi @dvrogozh!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

See: meta-llama/llama-stack#558 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

facebook-github-bot · 2024-12-03T00:04:44Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

ashwinb · 2025-01-14T19:10:52Z

@dvrogozh are you interesting in moving forward with this change? if so, could you add a Test Plan?

dvrogozh · 2025-01-14T19:34:15Z

@dvrogozh are you interesting in moving forward with this change?

@ashwinb, thank you for taking a look. Yes, I plan to move forward with this PR. The reason it's currently a draft is it's dependency from PR in llama-models: meta-llama/llama-models#233 which should be reviewed and merged first. I am waiting for its review. Can you help with that?

if so, could you add a Test Plan?

Yes, sure, I will help adding tests covering different devices. If you need anything specific, please, let me know.

dvrogozh · 2025-01-30T21:55:00Z

@ashwinb : can you, please, give some insights on the existing tests? At the moment there is a lack of documentation regarding tests in llama-stack. Can you suggest if there are existing tests for meta-reference stack which should be extended to work in non-cuda environments?

I thought that https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py might provide such a test. However, it fails for me running even in non-modified llama-stack on CUDA system. Questions:

Is this test up to date and can run and pass?
Does this test require built and started meta-reference-gpu distribution?

I am getting this when run on-modified llama-stack on CUDA system:

pytest -rsf ./llama_stack/providers/inline/agents/meta_reference/tests
...
E           TypeError: MockToolGroupsAPI.list_tools() got an unexpected keyword argument 'toolgroup_id'
...
FAILED llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py::test_chat_agent_create_and_execute_turn - pydantic_core._pydantic_core.ValidationError: 1 validation error for ChatCompletionResponseEvent
FAILED llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py::test_chat_agent_complex_turn - pydantic_core._pydantic_core.ValidationError: 1 validation error for ChatCompletionResponseEvent
FAILED llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py::test_chat_agent_tools[toolgroups1-True-False] - TypeError: MockToolGroupsAPI.list_tools() got an unexpected keyword argument 'toolgroup_id'
FAILED llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py::test_chat_agent_tools[toolgroups2-False-True] - TypeError: MockToolGroupsAPI.list_tools() got an unexpected keyword argument 'toolgroup_id'
FAILED llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py::test_chat_agent_tools[toolgroups3-True-True] - TypeError: MockToolGroupsAPI.list_tools() got an unexpected keyword argument 'toolgroup_id'
===================================== 5 failed, 2 passed, 10 warnings in 1.35s =====================================

ashwinb · 2025-01-30T23:05:07Z

Will kill that test, it is stale. The test setup that is relevant for you is in providers/tests/inference. See https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/tests/README.md

On a CUDA system, you can run it like this:

pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py \
  -m "meta-reference and llama_8b" \
  --env <...>

This commit adds support for XPU and CPU devices into meta-reference stack for text models. On creation stack automatically identifies which device to use checking available accelerate capabilities in the following order: CUDA, then XPU, finally CPU. This behaviour can be overwritten with the `DEVICE` environment variable. In this case explicitly specified device will be used. Tested with: ``` torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference ``` Results: * Tested on: system with single CUDA device, system with single XPU device and on pure CPU system * Results: all test pass except `test_completion_logprobs` * `test_completion_logprobs` fails in the same way as on a baseline, i.e. unrelated with this change: `AssertionError: Unexpected top_k=3` Requires: meta-llama/llama-models#233 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

ashwinb

Thank you, this is going to be quite useful.

dvrogozh · 2025-01-31T20:17:08Z

@ashwinb : thank you for pointing out the relevant test. I have rebased the PR on the latest main. The test you pointed to works for non-CUDA device without modifications since at the moment I added automated device selection on the model creation.

FYI, I observe the following test failure on the main code without applying this PR:

# torchrun `which pytest` llama_stack/providers/tests/inference/test_text_inference.py -k "meta_reference and test_completion_logprobs"
...
llama_stack/providers/tests/inference/test_text_inference.py:145: in test_completion_logprobs
    response = await inference_impl.completion(
llama_stack/providers/utils/telemetry/trace_protocol.py:101: in async_wrapper
    result = await method(self, *args, **kwargs)
llama_stack/distribution/routers/routers.py:194: in completion
    return await provider.completion(**params)
llama_stack/providers/utils/telemetry/trace_protocol.py:101: in async_wrapper
    result = await method(self, *args, **kwargs)
llama_stack/providers/inline/inference/meta_reference/inference.py:149: in completion
    assert logprobs.top_k == 1, f"Unexpected top_k={logprobs.top_k}"
E   AssertionError: Unexpected top_k=3
...
=
FAILED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_logprobs[-meta_reference] - AssertionError: Unexpected top_k=3

There seems to be a corresponding enhancement request:

Add top_k output tokens w/ corresponding logprobs #214

ashwinb · 2025-02-06T22:08:11Z

@dvrogozh yes the implementation only supports top_k but the test is using top_k=3, will fix

This commit adds support for XPU and CPU devices into meta-reference stack for text models. On creation stack automatically identifies which device to use checking available accelerate capabilities in the following order: CUDA, then XPU, finally CPU. This behaviour can be overwritten with the `DEVICE` environment variable. In this case explicitly specified device will be used. Tested with: ``` torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference ``` Results: * Tested on: system with single CUDA device, system with single XPU device and on pure CPU system * Results: all test pass except `test_completion_logprobs` * `test_completion_logprobs` fails in the same way as on a baseline, i.e. unrelated with this change: `AssertionError: Unexpected top_k=3` Requires: meta-llama/llama-models#233 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh added a commit to dvrogozh/notebook that referenced this pull request Dec 2, 2024

meta-llama: add link to PR-558 for llama-stack

52ff04f

See: meta-llama/llama-stack#558 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh mentioned this pull request Dec 2, 2024

meta-llama: add link to PR-558 for llama-stack dvrogozh/notebook#11

Merged

dvrogozh added a commit to dvrogozh/notebook that referenced this pull request Dec 2, 2024

meta-llama: add link to PR-558 for llama-stack

3d81cb1

See: meta-llama/llama-stack#558 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 3, 2024

dvrogozh force-pushed the xpu branch from cc78805 to 4456af7 Compare December 3, 2024 22:12

dvrogozh mentioned this pull request Jan 28, 2025

feat: support non-cuda devices for text and vision models meta-llama/llama-models#233

Merged

dvrogozh force-pushed the xpu branch 3 times, most recently from bf43ad2 to 4956907 Compare January 30, 2025 21:29

dvrogozh force-pushed the xpu branch from 4956907 to 750020e Compare January 31, 2025 20:07

dvrogozh marked this pull request as ready for review January 31, 2025 20:09

dvrogozh requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic and sixianyi0721 as code owners January 31, 2025 20:09

ashwinb approved these changes Jan 31, 2025

View reviewed changes

ashwinb merged commit 7ea14ae into meta-llama:main Jan 31, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable xpu support for meta-reference stack #558

feat: enable xpu support for meta-reference stack #558

dvrogozh commented Dec 2, 2024 •

edited

Loading

facebook-github-bot commented Dec 2, 2024

facebook-github-bot commented Dec 3, 2024

ashwinb commented Jan 14, 2025

dvrogozh commented Jan 14, 2025

dvrogozh commented Jan 30, 2025

ashwinb commented Jan 30, 2025

ashwinb left a comment

dvrogozh commented Jan 31, 2025

ashwinb commented Feb 6, 2025

feat: enable xpu support for meta-reference stack #558

feat: enable xpu support for meta-reference stack #558

Conversation

dvrogozh commented Dec 2, 2024 • edited Loading

facebook-github-bot commented Dec 2, 2024

Process

facebook-github-bot commented Dec 3, 2024

ashwinb commented Jan 14, 2025

dvrogozh commented Jan 14, 2025

dvrogozh commented Jan 30, 2025

ashwinb commented Jan 30, 2025

ashwinb left a comment

Choose a reason for hiding this comment

dvrogozh commented Jan 31, 2025

ashwinb commented Feb 6, 2025

dvrogozh commented Dec 2, 2024 •

edited

Loading