Skip to content

CodeGen/CodeTrans - Adding files to deploy an application in the K8S environment using Helm #1792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
cf60682
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
1fd1de1
DocSum - fix main
Feb 13, 2025
bd2d47e
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
2459ecb
DocSum - fix main
Feb 13, 2025
4d35065
Merge remote-tracking branch 'origin/main'
Feb 19, 2025
6d5049d
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
9dfbdc5
DocSum - fix main
Feb 13, 2025
a8857ae
DocSum - add files for deploy app with ROCm vLLM
Feb 13, 2025
5a38b26
DocSum - fix main
Feb 13, 2025
0e2ef94
Merge remote-tracking branch 'origin/main'
Feb 25, 2025
30071db
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Mar 11, 2025
0757dec
Merge branch 'opea-project:main' into main
artem-astafev Mar 20, 2025
9aaf378
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Mar 26, 2025
9cf4b6e
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Apr 3, 2025
8e89787
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Apr 5, 2025
a117c69
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
Apr 11, 2025
82e675c
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
Apr 11, 2025
d2717ae
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 11, 2025
cf9b048
Merge branch 'main' of https://github.com/opea-project/GenAIExamples …
chyundunovDatamonsters Apr 18, 2025
584f4fd
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters Apr 18, 2025
742acd6
Merge remote-tracking branch 'origin/feature/CodeGen_CodeTrans_k8s' i…
chyundunovDatamonsters Apr 18, 2025
9cd726e
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters Apr 19, 2025
8b46bf4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2025
e93bd62
Merge branch 'main' into feature/CodeGen_CodeTrans_k8s
chyundunovDatamonsters Apr 22, 2025
07e838e
Merge branch 'main' of https://github.com/opea-project/GenAIExamples …
chyundunovDatamonsters Apr 24, 2025
adbb079
Merge remote-tracking branch 'origin/feature/CodeGen_CodeTrans_k8s' i…
chyundunovDatamonsters Apr 24, 2025
2b02f6a
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters Apr 24, 2025
061a646
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters Apr 24, 2025
5116ecb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 24, 2025
849d8a1
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters Apr 24, 2025
b61b824
Merge remote-tracking branch 'origin/feature/CodeGen_CodeTrans_k8s' i…
chyundunovDatamonsters Apr 24, 2025
34382be
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters Apr 24, 2025
87a0169
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 24, 2025
4bb240f
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters Apr 24, 2025
c38d6e3
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters Apr 24, 2025
f34ac3b
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters Apr 25, 2025
03b12e3
Merge branch 'main' into feature/CodeGen_CodeTrans_k8s
chyundunovDatamonsters Apr 25, 2025
e56fac1
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters May 27, 2025
3c46038
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 27, 2025
14f2c1d
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters May 27, 2025
d93f017
Merge remote-tracking branch 'origin/feature/CodeGen_CodeTrans_k8s' i…
chyundunovDatamonsters May 27, 2025
1e654ee
Merge branch 'main' of https://github.com/opea-project/GenAIExamples …
chyundunovDatamonsters May 27, 2025
4d8db18
Merge branch 'main' into feature/CodeGen_CodeTrans_k8s
chensuyue May 30, 2025
c628cf5
Merge branch 'main' into feature/CodeGen_CodeTrans_k8s
ZePan110 May 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions CodeGen/kubernetes/helm/README.md
Original file line number Diff line number Diff line change
@@ -131,3 +131,150 @@ Optionally, delete the namespace if it's no longer needed and empty:
```bash
# kubectl delete ns codegen
```

## Deploy on AMD ROCm using Helm charts from the binary Helm repository

```bash
mkdir ~/codegen-k8s-install && cd ~/codegen-k8s-install
```

### Cloning repos

```bash
git clone https://github.com/opea-project/GenAIExamples.git
```

### Go to the installation directory

```bash
cd GenAIExamples/CodeGen/kubernetes/helm
```

### Settings system variables

```bash
export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
export MODELNAME="Qwen/Qwen2.5-Coder-7B-Instruct"
```

### Setting variables in Values files

#### If ROCm vLLM used
```bash
nano ~/codegen-k8s-install/GenAIExamples/CodeGen/kubernetes/helm/rocm-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

#### If ROCm TGI used

```bash
nano ~/codegen-k8s-install/GenAIExamples/CodeGen/kubernetes/helm/rocm-tgi-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

### Installing the Helm Chart

#### If ROCm vLLM used
```bash
helm upgrade --install codegen oci://ghcr.io/opea-project/charts/codegen \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values rocm-values.yaml
```

#### If ROCm TGI used
```bash
helm upgrade --install codegen oci://ghcr.io/opea-project/charts/codegen \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values rocm-tgi-values.yaml
```

## Deploy on AMD ROCm using Helm charts from Git repositories

### Creating working dirs

```bash
mkdir ~/codegen-k8s-install && cd ~/codegen-k8s-install
```

### Cloning repos

```bash
git clone https://github.com/opea-project/GenAIExamples.git
git clone https://github.com/opea-project/GenAIInfra.git
```

### Go to the installation directory

```bash
cd GenAIExamples/CodeGen/kubernetes/helm
```

### Settings system variables

```bash
export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
export MODELNAME="Qwen/Qwen2.5-Coder-7B-Instruct"
```

### Setting variables in Values files

#### If ROCm vLLM used
```bash
nano ~/codegen-k8s-install/GenAIExamples/CodeGen/kubernetes/helm/rocm-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

#### If ROCm TGI used

```bash
nano ~/codegen-k8s-install/GenAIExamples/CodeGen/kubernetes/helm/rocm-tgi-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

### Installing the Helm Chart

#### If ROCm vLLM used
```bash
cd ~/codegen-k8s-install/GenAIInfra/helm-charts
./update_dependency.sh
helm dependency update codegen
helm upgrade --install codegen codegen \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values ../../GenAIExamples/CodeGen/kubernetes/helm/rocm-values.yaml
```

#### If ROCm TGI used
```bash
cd ~/codegen-k8s-install/GenAIInfra/helm-charts
./update_dependency.sh
helm dependency update codegen
helm upgrade --install codegen codegen \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values ../../GenAIExamples/CodeGen/kubernetes/helm/rocm-tgi-values.yaml
```
45 changes: 45 additions & 0 deletions CodeGen/kubernetes/helm/rocm-tgi-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright (c) 2025 Advanced Micro Devices, Inc.


tgi:
enabled: true
accelDevice: "rocm"
image:
repository: ghcr.io/huggingface/text-generation-inference
tag: "2.4.1-rocm"
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
USE_FLASH_ATTENTION: "false"
FLASH_ATTENTION_RECOMPUTE: "false"
HIP_VISIBLE_DEVICES: "0"
MAX_BATCH_SIZE: "4"
extraCmdArgs: [ "--num-shard","1" ]
resources:
limits:
amd.com/gpu: "1"
requests:
cpu: 1
memory: 16Gi
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a PR to GenAIInfra?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here it's ok to keep run with root? Why chatqna is special? https://github.com/opea-project/GenAIInfra/pull/949/files/180f16fb65570968a44663d0490c42ed539862b0#diff-f93551169c7cda08f51cb91abe0a36eb96356b53ace54c5fd940d24d5d4264acR29

Changes to the launch from an unprivileged user will be made after this PR is completed - opea-project/GenAIComps#1638

runAsUser: 0
capabilities:
add:
- SYS_PTRACE
readinessProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
startupProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
vllm:
enabled: false
llm-uservice:
TEXTGEN_BACKEND: TGI
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct"
41 changes: 41 additions & 0 deletions CodeGen/kubernetes/helm/rocm-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Copyright (c) 2025 Advanced Micro Devices, Inc.


tgi:
enabled: false

vllm:
enabled: true
accelDevice: "rocm"
image:
repository: opea/vllm-rocm
tag: latest
env:
HIP_VISIBLE_DEVICES: "0"
TENSOR_PARALLEL_SIZE: "1"
HF_HUB_DISABLE_PROGRESS_BARS: "1"
HF_HUB_ENABLE_HF_TRANSFER: "0"
VLLM_USE_TRITON_FLASH_ATTN: "0"
VLLM_WORKER_MULTIPROC_METHOD: "spawn"
PYTORCH_JIT: "0"
HF_HOME: "/data"
extraCmd:
command: [ "python3", "/workspace/api_server.py" ]
extraCmdArgs: [ "--swap-space", "16",
"--disable-log-requests",
"--dtype", "float16",
"--num-scheduler-steps", "1",
"--distributed-executor-backend", "mp" ]
resources:
limits:
amd.com/gpu: "1"
startupProbe:
failureThreshold: 180
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0

llm-uservice:
TEXTGEN_BACKEND: vLLM
retryTimeoutSeconds: 720
147 changes: 147 additions & 0 deletions CodeTrans/kubernetes/helm/README.md
Original file line number Diff line number Diff line change
@@ -16,3 +16,150 @@ helm install codetrans oci://ghcr.io/opea-project/charts/codetrans --set global
export HFTOKEN="insert-your-huggingface-token-here"
helm install codetrans oci://ghcr.io/opea-project/charts/codetrans --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml
```

## Deploy on AMD ROCm using Helm charts from the binary Helm repository

```bash
mkdir ~/codetrans-k8s-install && cd ~/codetrans-k8s-install
```

### Cloning repos

```bash
git clone https://github.com/opea-project/GenAIExamples.git
```

### Go to the installation directory

```bash
cd GenAIExamples/CodeTrans/kubernetes/helm
```

### Settings system variables

```bash
export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
export MODELNAME="mistralai/Mistral-7B-Instruct-v0.3"
```

### Setting variables in Values files

#### If ROCm vLLM used
```bash
nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

#### If ROCm TGI used

```bash
nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-tgi-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

### Installing the Helm Chart

#### If ROCm vLLM used
```bash
helm upgrade --install codetrans oci://ghcr.io/opea-project/charts/codetrans \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values rocm-values.yaml
```

#### If ROCm TGI used
```bash
helm upgrade --install codetrans oci://ghcr.io/opea-project/charts/codetrans \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values rocm-tgi-values.yaml
```

## Deploy on AMD ROCm using Helm charts from Git repositories

### Creating working dirs

```bash
mkdir ~/codetrans-k8s-install && cd ~/codetrans-k8s-install
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you created the directory, cd to it. All the other paths then get shorter, do not need to reference ~/codetrans-k8s-install everywhere

```

### Cloning repos

```bash
git clone https://github.com/opea-project/GenAIExamples.git
git clone https://github.com/opea-project/GenAIInfra.git
```

### Go to the installation directory

```bash
cd GenAIExamples/CodeGen/kubernetes/helm
```

### Settings system variables

```bash
export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
export MODELNAME="mistralai/Mistral-7B-Instruct-v0.3"
```

### Setting variables in Values files

#### If ROCm vLLM used
```bash
nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

#### If ROCm TGI used

```bash
nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-tgi-values.yaml
```

- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
- resources:
limits:
amd.com/gpu: "1" - replace "1" with the number of GPUs used

### Installing the Helm Chart

#### If ROCm vLLM used
```bash
cd ~/codetrans-k8s-install/GenAIInfra/helm-charts
./update_dependency.sh
helm dependency update codetrans
helm upgrade --install codetrans codetrans \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values ../../GenAIExamples/CodeTrans/kubernetes/helm/rocm-values.yaml
```

#### If ROCm TGI used
```bash
cd ~/codetrans-k8s-install/GenAIInfra/helm-charts
./update_dependency.sh
helm dependency update codetrans
helm upgrade --install codetrans codetrans \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--values ../../GenAIExamples/CodeTrans/kubernetes/helm/rocm-tgi-values.yaml
```
44 changes: 44 additions & 0 deletions CodeTrans/kubernetes/helm/rocm-tgi-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Copyright (c) 2025 Advanced Micro Devices, Inc.

tgi:
enabled: true
accelDevice: "rocm"
image:
repository: ghcr.io/huggingface/text-generation-inference
tag: "2.4.1-rocm"
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
USE_FLASH_ATTENTION: "false"
FLASH_ATTENTION_RECOMPUTE: "false"
HIP_VISIBLE_DEVICES: "0"
MAX_BATCH_SIZE: "4"
extraCmdArgs: [ "--num-shard","1" ]
resources:
limits:
amd.com/gpu: "1"
requests:
cpu: 1
memory: 16Gi
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0
capabilities:
add:
- SYS_PTRACE
readinessProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
startupProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
vllm:
enabled: false
llm-uservice:
TEXTGEN_BACKEND: TGI
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct"
42 changes: 42 additions & 0 deletions CodeTrans/kubernetes/helm/rocm-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright (c) 2025 Advanced Micro Devices, Inc.

tgi:
enabled: false

vllm:
enabled: true
accelDevice: "rocm"
image:
repository: opea/vllm-rocm
tag: latest
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct"
env:
HIP_VISIBLE_DEVICES: "0"
TENSOR_PARALLEL_SIZE: "1"
HF_HUB_DISABLE_PROGRESS_BARS: "1"
HF_HUB_ENABLE_HF_TRANSFER: "0"
VLLM_USE_TRITON_FLASH_ATTN: "0"
VLLM_WORKER_MULTIPROC_METHOD: "spawn"
PYTORCH_JIT: "0"
HF_HOME: "/data"
extraCmd:
command: [ "python3", "/workspace/api_server.py" ]
extraCmdArgs: [ "--swap-space", "16",
"--disable-log-requests",
"--dtype", "float16",
"--num-scheduler-steps", "1",
"--distributed-executor-backend", "mp" ]
resources:
limits:
amd.com/gpu: "1"
startupProbe:
failureThreshold: 180
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0

llm-uservice:
TEXTGEN_BACKEND: vLLM
retryTimeoutSeconds: 720
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct"