Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable xpu support for meta-reference stack #558

Merged
merged 1 commit into from
Jan 31, 2025

Conversation

dvrogozh
Copy link
Contributor

@dvrogozh dvrogozh commented Dec 2, 2024

This commit adds support for XPU and CPU devices into meta-reference stack for text models. On creation stack automatically identifies which device to use checking available accelerate capabilities in the following order: CUDA, then XPU, finally CPU. This behaviour can be overwritten with the DEVICE environment variable. In this case explicitly specified device will be used.

Tested with:

torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference

Results:

  • Tested on: system with single CUDA device, system with single XPU device and on pure CPU system
  • Results: all test pass except test_completion_logprobs
  • test_completion_logprobs fails in the same way as on a baseline, i.e. unrelated with this change: AssertionError: Unexpected top_k=3

Requires: meta-llama/llama-models#233

@facebook-github-bot
Copy link

Hi @dvrogozh!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

dvrogozh added a commit to dvrogozh/notebook that referenced this pull request Dec 2, 2024
See: meta-llama/llama-stack#558
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh added a commit to dvrogozh/notebook that referenced this pull request Dec 2, 2024
See: meta-llama/llama-stack#558
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 3, 2024
@ashwinb
Copy link
Contributor

ashwinb commented Jan 14, 2025

@dvrogozh are you interesting in moving forward with this change? if so, could you add a Test Plan?

@dvrogozh
Copy link
Contributor Author

@dvrogozh are you interesting in moving forward with this change?

@ashwinb, thank you for taking a look. Yes, I plan to move forward with this PR. The reason it's currently a draft is it's dependency from PR in llama-models: meta-llama/llama-models#233 which should be reviewed and merged first. I am waiting for its review. Can you help with that?

if so, could you add a Test Plan?

Yes, sure, I will help adding tests covering different devices. If you need anything specific, please, let me know.

@dvrogozh
Copy link
Contributor Author

@ashwinb : can you, please, give some insights on the existing tests? At the moment there is a lack of documentation regarding tests in llama-stack. Can you suggest if there are existing tests for meta-reference stack which should be extended to work in non-cuda environments?

I thought that https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py might provide such a test. However, it fails for me running even in non-modified llama-stack on CUDA system. Questions:

  • Is this test up to date and can run and pass?
  • Does this test require built and started meta-reference-gpu distribution?

I am getting this when run on-modified llama-stack on CUDA system:

pytest -rsf ./llama_stack/providers/inline/agents/meta_reference/tests
...
E           TypeError: MockToolGroupsAPI.list_tools() got an unexpected keyword argument 'toolgroup_id'
...
FAILED llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py::test_chat_agent_create_and_execute_turn - pydantic_core._pydantic_core.ValidationError: 1 validation error for ChatCompletionResponseEvent
FAILED llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py::test_chat_agent_complex_turn - pydantic_core._pydantic_core.ValidationError: 1 validation error for ChatCompletionResponseEvent
FAILED llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py::test_chat_agent_tools[toolgroups1-True-False] - TypeError: MockToolGroupsAPI.list_tools() got an unexpected keyword argument 'toolgroup_id'
FAILED llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py::test_chat_agent_tools[toolgroups2-False-True] - TypeError: MockToolGroupsAPI.list_tools() got an unexpected keyword argument 'toolgroup_id'
FAILED llama_stack/providers/inline/agents/meta_reference/tests/test_chat_agent.py::test_chat_agent_tools[toolgroups3-True-True] - TypeError: MockToolGroupsAPI.list_tools() got an unexpected keyword argument 'toolgroup_id'
===================================== 5 failed, 2 passed, 10 warnings in 1.35s =====================================

@ashwinb
Copy link
Contributor

ashwinb commented Jan 30, 2025

Will kill that test, it is stale. The test setup that is relevant for you is in providers/tests/inference. See https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/tests/README.md

On a CUDA system, you can run it like this:

pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py \
  -m "meta-reference and llama_8b" \
  --env <...>

This commit adds support for XPU and CPU devices into meta-reference
stack for text models. On creation stack automatically identifies which
device to use checking available accelerate capabilities in the
following order: CUDA, then XPU, finally CPU. This behaviour can be
overwritten with the `DEVICE` environment variable. In this case
explicitly specified device will be used.

Tested with:
```
torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference
```

Results:
* Tested on: system with single CUDA device, system with single XPU device and on pure CPU system
* Results: all test pass except `test_completion_logprobs`
* `test_completion_logprobs` fails in the same way as on a baseline, i.e. unrelated with this change: `AssertionError: Unexpected top_k=3`

Requires: meta-llama/llama-models#233
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is going to be quite useful.

@ashwinb ashwinb merged commit 7ea14ae into meta-llama:main Jan 31, 2025
2 checks passed
@dvrogozh
Copy link
Contributor Author

@ashwinb : thank you for pointing out the relevant test. I have rebased the PR on the latest main. The test you pointed to works for non-CUDA device without modifications since at the moment I added automated device selection on the model creation.

FYI, I observe the following test failure on the main code without applying this PR:

# torchrun `which pytest` llama_stack/providers/tests/inference/test_text_inference.py -k "meta_reference and test_completion_logprobs"
...
llama_stack/providers/tests/inference/test_text_inference.py:145: in test_completion_logprobs
    response = await inference_impl.completion(
llama_stack/providers/utils/telemetry/trace_protocol.py:101: in async_wrapper
    result = await method(self, *args, **kwargs)
llama_stack/distribution/routers/routers.py:194: in completion
    return await provider.completion(**params)
llama_stack/providers/utils/telemetry/trace_protocol.py:101: in async_wrapper
    result = await method(self, *args, **kwargs)
llama_stack/providers/inline/inference/meta_reference/inference.py:149: in completion
    assert logprobs.top_k == 1, f"Unexpected top_k={logprobs.top_k}"
E   AssertionError: Unexpected top_k=3
...
=
FAILED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_logprobs[-meta_reference] - AssertionError: Unexpected top_k=3

There seems to be a corresponding enhancement request:

@ashwinb
Copy link
Contributor

ashwinb commented Feb 6, 2025

@dvrogozh yes the implementation only supports top_k but the test is using top_k=3, will fix

srikanthbachala20 pushed a commit to srikanthbachala20/llama-stack that referenced this pull request Feb 27, 2025
This commit adds support for XPU and CPU devices into meta-reference
stack for text models. On creation stack automatically identifies which
device to use checking available accelerate capabilities in the
following order: CUDA, then XPU, finally CPU. This behaviour can be
overwritten with the `DEVICE` environment variable. In this case
explicitly specified device will be used.

Tested with:
```
torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference
```

Results:
* Tested on: system with single CUDA device, system with single XPU
device and on pure CPU system
* Results: all test pass except `test_completion_logprobs`
* `test_completion_logprobs` fails in the same way as on a baseline,
i.e. unrelated with this change: `AssertionError: Unexpected top_k=3`

Requires: meta-llama/llama-models#233

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants