Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug ]ChatQnA - compose.yaml for Gaudi - Habana devices #739

Closed
2 of 6 tasks
pallavijaini0525 opened this issue Sep 5, 2024 · 8 comments
Closed
2 of 6 tasks

[Bug ]ChatQnA - compose.yaml for Gaudi - Habana devices #739

pallavijaini0525 opened this issue Sep 5, 2024 · 8 comments
Assignees
Labels

Comments

@pallavijaini0525
Copy link
Collaborator

Priority

Undecided

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source

Deploy method

  • Docker compose
  • Docker
  • Kubernetes
  • Helm

Running nodes

Single Node

What's the version?

https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml

Description

For the ChatQnA application, https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml

compose.yaml has two containers where both are requesting HABANA_VISIBLE_DEVICES=all, For multi tenancy we need to specify the device ids instead of all,

with the existing compose.yaml, error is as below.

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: exposing interfaces: failed creating temporary link on host: invalid argument

Reproduce steps

Run the docker compose file - https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml after setting the env variables specified in https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/gaudi#setup-environment-variables

Raw log

No response

@feng-intel
Copy link
Collaborator

Gaudi docs page:
https://docs.habana.ai/en/latest/Orchestration/Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html

You can set HABANA_VISIBLE_DEVICES=0,1,2,3 , to specify the device ids instead of all.

@feng-intel feng-intel self-assigned this Sep 5, 2024
@yinghu5 yinghu5 added the aitce label Sep 5, 2024
@pallavijaini0525
Copy link
Collaborator Author

yes, I have made the change and able to execute, but added here to create a placeholder or make a note in the Readme file so the user will not miss updating the devices.

@feng-intel
Copy link
Collaborator

Note:
Gaudi doc -> Device Management ->

Sharing 1 device between multiple processes | No | No

That means llm_service and tei embedding have to run on different gaudi card.

@lvliang-intel
Here -> ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml
why ${tei_embedding_devices} and ${llm_service_devices} were replaced to be all?

@lvliang-intel
Copy link
Collaborator

"all" means the system will allocate the device automatically. Users don't need to set the device number.

@feng-intel
Copy link
Collaborator

Do you make sure "system" can allocate different device for different container ?

@lvliang-intel
Copy link
Collaborator

Yes, the system will automatically allocate a Gaudi card. Allowing users to specify the card number may not be a good idea. Normal users have no more knowledge about the Gaudi system.

@feng-intel
Copy link
Collaborator

@pallavijaini0525 Can we close the issue ?

@pallavijaini0525
Copy link
Collaborator Author

yes please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants