[Bug ]ChatQnA - compose.yaml for Gaudi - Habana devices #739

pallavijaini0525 · 2024-09-05T04:48:54Z

Priority

Undecided

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

Pull docker images from hub.docker.com
Build docker images from source

Deploy method

Docker compose
Docker
Kubernetes
Helm

Running nodes

Single Node

What's the version?

https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml

Description

For the ChatQnA application, https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml

compose.yaml has two containers where both are requesting HABANA_VISIBLE_DEVICES=all, For multi tenancy we need to specify the device ids instead of all,

with the existing compose.yaml, error is as below.

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: exposing interfaces: failed creating temporary link on host: invalid argument

Reproduce steps

Run the docker compose file - https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml after setting the env variables specified in https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/gaudi#setup-environment-variables

Raw log

No response

feng-intel · 2024-09-05T08:16:35Z

Gaudi docs page:
https://docs.habana.ai/en/latest/Orchestration/Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html

You can set HABANA_VISIBLE_DEVICES=0,1,2,3 , to specify the device ids instead of all.

pallavijaini0525 · 2024-09-05T17:18:47Z

yes, I have made the change and able to execute, but added here to create a placeholder or make a note in the Readme file so the user will not miss updating the devices.

feng-intel · 2024-10-12T07:52:54Z

Note:
Gaudi doc -> Device Management ->

Sharing 1 device between multiple processes | No | No

That means llm_service and tei embedding have to run on different gaudi card.

@lvliang-intel
Here -> ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml
why ${tei_embedding_devices} and ${llm_service_devices} were replaced to be all?

lvliang-intel · 2024-10-18T07:27:18Z

"all" means the system will allocate the device automatically. Users don't need to set the device number.

feng-intel · 2024-10-21T01:23:50Z

Do you make sure "system" can allocate different device for different container ?

lvliang-intel · 2024-11-03T10:44:04Z

Yes, the system will automatically allocate a Gaudi card. Allowing users to specify the card number may not be a good idea. Normal users have no more knowledge about the Gaudi system.

feng-intel · 2024-11-07T05:04:35Z

@pallavijaini0525 Can we close the issue ?

pallavijaini0525 · 2024-11-07T05:05:59Z

yes please

feng-intel self-assigned this Sep 5, 2024

yinghu5 added the aitce label Sep 5, 2024

feng-intel mentioned this issue Sep 6, 2024

Yaml: add comments to specify gaudi device ids. #753

Merged

3 tasks

rofinn mentioned this issue Sep 7, 2024

ChatQnA docker gaudi quickstart #765

Closed

1 task

feng-intel assigned lvliang-intel Oct 12, 2024

feng-intel closed this as completed Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug ]ChatQnA - compose.yaml for Gaudi - Habana devices #739

[Bug ]ChatQnA - compose.yaml for Gaudi - Habana devices #739

pallavijaini0525 commented Sep 5, 2024

feng-intel commented Sep 5, 2024

pallavijaini0525 commented Sep 5, 2024

feng-intel commented Oct 12, 2024

lvliang-intel commented Oct 18, 2024

feng-intel commented Oct 21, 2024

lvliang-intel commented Nov 3, 2024

feng-intel commented Nov 7, 2024

pallavijaini0525 commented Nov 7, 2024

[Bug ]ChatQnA - compose.yaml for Gaudi - Habana devices #739

[Bug ]ChatQnA - compose.yaml for Gaudi - Habana devices #739

Comments

pallavijaini0525 commented Sep 5, 2024

Priority

OS type

Hardware type

Installation method

Deploy method

Running nodes

What's the version?

Description

Reproduce steps

Raw log

feng-intel commented Sep 5, 2024

pallavijaini0525 commented Sep 5, 2024

feng-intel commented Oct 12, 2024

lvliang-intel commented Oct 18, 2024

feng-intel commented Oct 21, 2024

lvliang-intel commented Nov 3, 2024

feng-intel commented Nov 7, 2024

pallavijaini0525 commented Nov 7, 2024