Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
cf60682
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
1fd1de1
DocSum - fix main
Feb 13, 2025
bd2d47e
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
2459ecb
DocSum - fix main
Feb 13, 2025
4d35065
Merge remote-tracking branch 'origin/main'
Feb 19, 2025
6d5049d
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
9dfbdc5
DocSum - fix main
Feb 13, 2025
a8857ae
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
5a38b26
DocSum - fix main
Feb 13, 2025
0e2ef94
Merge remote-tracking branch 'origin/main'
Feb 25, 2025
30071db
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Mar 11, 2025
0757dec
Merge branch 'opea-project:main' into main
artem-astafev Mar 20, 2025
9aaf378
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Mar 26, 2025
9cf4b6e
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Apr 3, 2025
ae70a0e
DocSum - Adding files to deploy an application in the K8S environment…
Apr 3, 2025
cc55cbe
DocSum - Adding files to deploy an application in the K8S environment…
Apr 3, 2025
4c2f970
DocSum - Adding files to deploy an application in the K8S environment…
Apr 3, 2025
2602f35
DocSum - Adding files to deploy an application in the K8S environment…
Apr 4, 2025
d01ecf4
Merge branch 'main' of https://github.com/opea-project/GenAIExamples …
Apr 5, 2025
515a913
DocSum - Adding files to deploy an application in the K8S environment…
Apr 5, 2025
6c184c3
DocSum - Adding files to deploy an application in the K8S environment…
Apr 5, 2025
b7a1e66
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 5, 2025
f24a9ed
DocSum - fix files for deploy on ROCm vLLM in K8S
chyundunovDatamonsters Apr 17, 2025
4cf6108
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2025
0740ba1
DocSum - fix files for deploy on ROCm vLLM in K8S
chyundunovDatamonsters Apr 18, 2025
ca78ab4
Merge remote-tracking branch 'origin/feature/DocSum_k8s' into feature…
chyundunovDatamonsters Apr 18, 2025
b62ab6d
Merge branch 'main' of https://github.com/opea-project/GenAIExamples …
chyundunovDatamonsters Apr 18, 2025
dea9823
DocSum - fix files for deploy on ROCm vLLM in K8S
chyundunovDatamonsters Apr 18, 2025
f557674
DocSum - fix files for deploy on ROCm vLLM in K8S
chyundunovDatamonsters Apr 22, 2025
307097e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 22, 2025
0694a4b
DocSum - fix files for deploy on ROCm vLLM in K8S
chyundunovDatamonsters Apr 22, 2025
40df8bb
Merge remote-tracking branch 'origin/feature/DocSum_k8s' into feature…
chyundunovDatamonsters Apr 22, 2025
c434566
DocSum - fix files for deploy on ROCm vLLM in K8S
chyundunovDatamonsters Apr 22, 2025
ce21b17
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 22, 2025
54cc97f
Merge branch 'main' into feature/DocSum_k8s
chensuyue Apr 22, 2025
c99fecd
DocSum - fix files for deploy on ROCm vLLM in K8S
chyundunovDatamonsters Apr 22, 2025
f0be86f
Merge remote-tracking branch 'origin/feature/DocSum_k8s' into feature…
chyundunovDatamonsters Apr 22, 2025
1954f01
DocSum - fix files for deploy on ROCm vLLM in K8S
chyundunovDatamonsters Apr 22, 2025
60c6a73
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 22, 2025
1022b77
DocSum - fix files for deploy on ROCm vLLM in K8S
chyundunovDatamonsters Apr 22, 2025
ce80b45
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions DocSum/kubernetes/helm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,150 @@ helm install docsum oci://ghcr.io/opea-project/charts/docsum --set global.HUGGI
export HFTOKEN="insert-your-huggingface-token-here"
helm install docsum oci://ghcr.io/opea-project/charts/docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml
```

## Deploy on AMD ROCm using Helm charts from the binary Helm repository

```bash
mkdir ~/docsum-k8s-install && cd ~/docsum-k8s-install
```

### Cloning repos

```bash
git clone git clone https://github.com/opea-project/GenAIExamples.git
```

### Go to the installation directory

```bash
cd GenAIExamples/DocSum/kubernetes/helm
```

### Settings system variables

```bash
export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
export MODELNAME="Intel/neural-chat-7b-v3-3"
```

### Setting variables in Values files

#### If ROCm vLLM used
```bash
nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

#### If ROCm TGI used

```bash
nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

### Installing the Helm Chart

#### If ROCm vLLM used
```bash
helm upgrade --install docsum oci://ghcr.io/opea-project/charts/docsum \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values rocm-values.yaml
```

#### If ROCm TGI used
```bash
helm upgrade --install docsum oci://ghcr.io/opea-project/charts/docsum \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values rocm-tgi-values.yaml
```

## Deploy on AMD ROCm using Helm charts from Git repositories

### Creating working dirs

```bash
mkdir ~/docsum-k8s-install && cd ~/docsum-k8s-install
```

### Cloning repos

```bash
git clone git clone https://github.com/opea-project/GenAIExamples.git
git clone git clone https://github.com/opea-project/GenAIInfra.git
```

### Go to the installation directory

```bash
cd GenAIExamples/DocSum/kubernetes/helm
```

### Settings system variables

```bash
export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
export MODELNAME="Intel/neural-chat-7b-v3-3"
```

### Setting variables in Values files

#### If ROCm vLLM used
```bash
nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

#### If ROCm TGI used

```bash
nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

### Installing the Helm Chart

#### If ROCm vLLM used
```bash
cd ~/docsum-k8s-install/GenAIInfra/helm-charts
./update_dependency.sh
helm dependency update docsum
helm upgrade --install docsum docsum \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values ../../GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml
```

#### If ROCm TGI used
```bash
cd ~/docsum-k8s-install/GenAIInfra/helm-charts
./update_dependency.sh
helm dependency update docsum
helm upgrade --install docsum docsum \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values ../../GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml
```
45 changes: 45 additions & 0 deletions DocSum/kubernetes/helm/rocm-tgi-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright (C) 2025 Advanced Micro Devices, Inc.

tgi:
enabled: true
accelDevice: "rocm"
image:
repository: ghcr.io/huggingface/text-generation-inference
tag: "2.4.1-rocm"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
USE_FLASH_ATTENTION: "false"
FLASH_ATTENTION_RECOMPUTE: "false"
HIP_VISIBLE_DEVICES: "0"
MAX_BATCH_SIZE: "4"
extraCmdArgs: [ "--num-shard","1" ]
resources:
limits:
amd.com/gpu: "1"
requests:
cpu: 1
memory: 16Gi
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0
capabilities:
add:
- SYS_PTRACE
readinessProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
startupProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120

llm-uservice:
DOCSUM_BACKEND: "TGI"
retryTimeoutSeconds: 720

vllm:
enabled: false
40 changes: 40 additions & 0 deletions DocSum/kubernetes/helm/rocm-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Copyright (C) 2025 Advanced Micro Devices, Inc.

tgi:
enabled: false

llm-uservice:
DOCSUM_BACKEND: "vLLM"
retryTimeoutSeconds: 720

vllm:
enabled: true
accelDevice: "rocm"
image:
repository: opea/vllm-rocm
tag: latest
env:
HIP_VISIBLE_DEVICES: "0"
TENSOR_PARALLEL_SIZE: "1"
HF_HUB_DISABLE_PROGRESS_BARS: "1"
HF_HUB_ENABLE_HF_TRANSFER: "0"
VLLM_USE_TRITON_FLASH_ATTN: "0"
VLLM_WORKER_MULTIPROC_METHOD: "spawn"
PYTORCH_JIT: "0"
HF_HOME: "/data"
extraCmd:
command: [ "python3", "/workspace/api_server.py" ]
extraCmdArgs: [ "--swap-space", "16",
"--disable-log-requests",
"--dtype", "float16",
"--num-scheduler-steps", "1",
"--distributed-executor-backend", "mp" ]
resources:
limits:
amd.com/gpu: "1"
startupProbe:
failureThreshold: 180
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0