-
Notifications
You must be signed in to change notification settings - Fork 307
CodeGen/CodeTrans - Adding files to deploy an application in the K8S environment using Helm #1792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
chyundunovDatamonsters
wants to merge
44
commits into
opea-project:main
from
chyundunovDatamonsters:feature/CodeGen_CodeTrans_k8s
+466
−0
Closed
Changes from all commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
cf60682
DocSum - add files for deploy app with ROCm vLLM
1fd1de1
DocSum - fix main
bd2d47e
DocSum - add files for deploy app with ROCm vLLM
2459ecb
DocSum - fix main
4d35065
Merge remote-tracking branch 'origin/main'
6d5049d
DocSum - add files for deploy app with ROCm vLLM
9dfbdc5
DocSum - fix main
a8857ae
DocSum - add files for deploy app with ROCm vLLM
5a38b26
DocSum - fix main
0e2ef94
Merge remote-tracking branch 'origin/main'
30071db
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
0757dec
Merge branch 'opea-project:main' into main
artem-astafev 9aaf378
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
9cf4b6e
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
8e89787
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
a117c69
Merge branch 'main' of https://github.com/opea-project/GenAIExamples
82e675c
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
d2717ae
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] cf9b048
Merge branch 'main' of https://github.com/opea-project/GenAIExamples …
chyundunovDatamonsters 584f4fd
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters 742acd6
Merge remote-tracking branch 'origin/feature/CodeGen_CodeTrans_k8s' i…
chyundunovDatamonsters 9cd726e
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters 8b46bf4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] e93bd62
Merge branch 'main' into feature/CodeGen_CodeTrans_k8s
chyundunovDatamonsters 07e838e
Merge branch 'main' of https://github.com/opea-project/GenAIExamples …
chyundunovDatamonsters adbb079
Merge remote-tracking branch 'origin/feature/CodeGen_CodeTrans_k8s' i…
chyundunovDatamonsters 2b02f6a
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters 061a646
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters 5116ecb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 849d8a1
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters b61b824
Merge remote-tracking branch 'origin/feature/CodeGen_CodeTrans_k8s' i…
chyundunovDatamonsters 34382be
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters 87a0169
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4bb240f
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters c38d6e3
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters f34ac3b
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters 03b12e3
Merge branch 'main' into feature/CodeGen_CodeTrans_k8s
chyundunovDatamonsters e56fac1
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters 3c46038
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 14f2c1d
CodeGen/CodeTrans - Adding files to deploy an application in the K8S …
chyundunovDatamonsters d93f017
Merge remote-tracking branch 'origin/feature/CodeGen_CodeTrans_k8s' i…
chyundunovDatamonsters 1e654ee
Merge branch 'main' of https://github.com/opea-project/GenAIExamples …
chyundunovDatamonsters 4d8db18
Merge branch 'main' into feature/CodeGen_CodeTrans_k8s
chensuyue c628cf5
Merge branch 'main' into feature/CodeGen_CodeTrans_k8s
ZePan110 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# Copyright (c) 2025 Advanced Micro Devices, Inc. | ||
|
||
|
||
tgi: | ||
enabled: true | ||
accelDevice: "rocm" | ||
image: | ||
repository: ghcr.io/huggingface/text-generation-inference | ||
tag: "2.4.1-rocm" | ||
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct" | ||
MAX_INPUT_LENGTH: "1024" | ||
MAX_TOTAL_TOKENS: "2048" | ||
USE_FLASH_ATTENTION: "false" | ||
FLASH_ATTENTION_RECOMPUTE: "false" | ||
HIP_VISIBLE_DEVICES: "0" | ||
MAX_BATCH_SIZE: "4" | ||
extraCmdArgs: [ "--num-shard","1" ] | ||
resources: | ||
limits: | ||
amd.com/gpu: "1" | ||
requests: | ||
cpu: 1 | ||
memory: 16Gi | ||
securityContext: | ||
readOnlyRootFilesystem: false | ||
runAsNonRoot: false | ||
runAsUser: 0 | ||
capabilities: | ||
add: | ||
- SYS_PTRACE | ||
readinessProbe: | ||
initialDelaySeconds: 60 | ||
periodSeconds: 5 | ||
timeoutSeconds: 1 | ||
failureThreshold: 120 | ||
startupProbe: | ||
initialDelaySeconds: 60 | ||
periodSeconds: 5 | ||
timeoutSeconds: 1 | ||
failureThreshold: 120 | ||
vllm: | ||
enabled: false | ||
llm-uservice: | ||
TEXTGEN_BACKEND: TGI | ||
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Copyright (c) 2025 Advanced Micro Devices, Inc. | ||
|
||
|
||
tgi: | ||
enabled: false | ||
|
||
vllm: | ||
enabled: true | ||
accelDevice: "rocm" | ||
image: | ||
repository: opea/vllm-rocm | ||
tag: latest | ||
env: | ||
HIP_VISIBLE_DEVICES: "0" | ||
TENSOR_PARALLEL_SIZE: "1" | ||
HF_HUB_DISABLE_PROGRESS_BARS: "1" | ||
HF_HUB_ENABLE_HF_TRANSFER: "0" | ||
VLLM_USE_TRITON_FLASH_ATTN: "0" | ||
VLLM_WORKER_MULTIPROC_METHOD: "spawn" | ||
PYTORCH_JIT: "0" | ||
HF_HOME: "/data" | ||
extraCmd: | ||
command: [ "python3", "/workspace/api_server.py" ] | ||
extraCmdArgs: [ "--swap-space", "16", | ||
"--disable-log-requests", | ||
"--dtype", "float16", | ||
"--num-scheduler-steps", "1", | ||
"--distributed-executor-backend", "mp" ] | ||
resources: | ||
limits: | ||
amd.com/gpu: "1" | ||
startupProbe: | ||
failureThreshold: 180 | ||
securityContext: | ||
readOnlyRootFilesystem: false | ||
runAsNonRoot: false | ||
runAsUser: 0 | ||
|
||
llm-uservice: | ||
TEXTGEN_BACKEND: vLLM | ||
retryTimeoutSeconds: 720 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,3 +16,150 @@ helm install codetrans oci://ghcr.io/opea-project/charts/codetrans --set global | |
export HFTOKEN="insert-your-huggingface-token-here" | ||
helm install codetrans oci://ghcr.io/opea-project/charts/codetrans --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml | ||
``` | ||
|
||
## Deploy on AMD ROCm using Helm charts from the binary Helm repository | ||
|
||
```bash | ||
mkdir ~/codetrans-k8s-install && cd ~/codetrans-k8s-install | ||
``` | ||
|
||
### Cloning repos | ||
|
||
```bash | ||
git clone https://github.com/opea-project/GenAIExamples.git | ||
``` | ||
|
||
### Go to the installation directory | ||
|
||
```bash | ||
cd GenAIExamples/CodeTrans/kubernetes/helm | ||
``` | ||
|
||
### Settings system variables | ||
|
||
```bash | ||
export HFTOKEN="your_huggingface_token" | ||
export MODELDIR="/mnt/opea-models" | ||
export MODELNAME="mistralai/Mistral-7B-Instruct-v0.3" | ||
``` | ||
|
||
### Setting variables in Values files | ||
|
||
#### If ROCm vLLM used | ||
```bash | ||
nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-values.yaml | ||
``` | ||
|
||
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used | ||
- resources: | ||
limits: | ||
amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
||
#### If ROCm TGI used | ||
|
||
```bash | ||
nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-tgi-values.yaml | ||
``` | ||
|
||
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used | ||
- resources: | ||
limits: | ||
amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
||
### Installing the Helm Chart | ||
|
||
#### If ROCm vLLM used | ||
```bash | ||
helm upgrade --install codetrans oci://ghcr.io/opea-project/charts/codetrans \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values rocm-values.yaml | ||
``` | ||
|
||
#### If ROCm TGI used | ||
```bash | ||
helm upgrade --install codetrans oci://ghcr.io/opea-project/charts/codetrans \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values rocm-tgi-values.yaml | ||
``` | ||
|
||
## Deploy on AMD ROCm using Helm charts from Git repositories | ||
|
||
### Creating working dirs | ||
|
||
```bash | ||
mkdir ~/codetrans-k8s-install && cd ~/codetrans-k8s-install | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once you created the directory, cd to it. All the other paths then get shorter, do not need to reference ~/codetrans-k8s-install everywhere |
||
``` | ||
|
||
### Cloning repos | ||
|
||
```bash | ||
git clone https://github.com/opea-project/GenAIExamples.git | ||
git clone https://github.com/opea-project/GenAIInfra.git | ||
``` | ||
|
||
### Go to the installation directory | ||
|
||
```bash | ||
cd GenAIExamples/CodeGen/kubernetes/helm | ||
``` | ||
|
||
### Settings system variables | ||
|
||
```bash | ||
export HFTOKEN="your_huggingface_token" | ||
export MODELDIR="/mnt/opea-models" | ||
export MODELNAME="mistralai/Mistral-7B-Instruct-v0.3" | ||
``` | ||
|
||
### Setting variables in Values files | ||
|
||
#### If ROCm vLLM used | ||
```bash | ||
nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-values.yaml | ||
``` | ||
|
||
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used | ||
- resources: | ||
limits: | ||
amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
||
#### If ROCm TGI used | ||
|
||
```bash | ||
nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-tgi-values.yaml | ||
``` | ||
|
||
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used | ||
- resources: | ||
limits: | ||
amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
||
### Installing the Helm Chart | ||
|
||
#### If ROCm vLLM used | ||
```bash | ||
cd ~/codetrans-k8s-install/GenAIInfra/helm-charts | ||
./update_dependency.sh | ||
helm dependency update codetrans | ||
helm upgrade --install codetrans codetrans \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values ../../GenAIExamples/CodeTrans/kubernetes/helm/rocm-values.yaml | ||
``` | ||
|
||
#### If ROCm TGI used | ||
```bash | ||
cd ~/codetrans-k8s-install/GenAIInfra/helm-charts | ||
./update_dependency.sh | ||
helm dependency update codetrans | ||
helm upgrade --install codetrans codetrans \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values ../../GenAIExamples/CodeTrans/kubernetes/helm/rocm-tgi-values.yaml | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Copyright (c) 2025 Advanced Micro Devices, Inc. | ||
|
||
tgi: | ||
enabled: true | ||
accelDevice: "rocm" | ||
image: | ||
repository: ghcr.io/huggingface/text-generation-inference | ||
tag: "2.4.1-rocm" | ||
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct" | ||
MAX_INPUT_LENGTH: "1024" | ||
MAX_TOTAL_TOKENS: "2048" | ||
USE_FLASH_ATTENTION: "false" | ||
FLASH_ATTENTION_RECOMPUTE: "false" | ||
HIP_VISIBLE_DEVICES: "0" | ||
MAX_BATCH_SIZE: "4" | ||
extraCmdArgs: [ "--num-shard","1" ] | ||
resources: | ||
limits: | ||
amd.com/gpu: "1" | ||
requests: | ||
cpu: 1 | ||
memory: 16Gi | ||
securityContext: | ||
readOnlyRootFilesystem: false | ||
runAsNonRoot: false | ||
runAsUser: 0 | ||
capabilities: | ||
add: | ||
- SYS_PTRACE | ||
readinessProbe: | ||
initialDelaySeconds: 60 | ||
periodSeconds: 5 | ||
timeoutSeconds: 1 | ||
failureThreshold: 120 | ||
startupProbe: | ||
initialDelaySeconds: 60 | ||
periodSeconds: 5 | ||
timeoutSeconds: 1 | ||
failureThreshold: 120 | ||
vllm: | ||
enabled: false | ||
llm-uservice: | ||
TEXTGEN_BACKEND: TGI | ||
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Copyright (c) 2025 Advanced Micro Devices, Inc. | ||
|
||
tgi: | ||
enabled: false | ||
|
||
vllm: | ||
enabled: true | ||
accelDevice: "rocm" | ||
image: | ||
repository: opea/vllm-rocm | ||
tag: latest | ||
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct" | ||
env: | ||
HIP_VISIBLE_DEVICES: "0" | ||
TENSOR_PARALLEL_SIZE: "1" | ||
HF_HUB_DISABLE_PROGRESS_BARS: "1" | ||
HF_HUB_ENABLE_HF_TRANSFER: "0" | ||
VLLM_USE_TRITON_FLASH_ATTN: "0" | ||
VLLM_WORKER_MULTIPROC_METHOD: "spawn" | ||
PYTORCH_JIT: "0" | ||
HF_HOME: "/data" | ||
extraCmd: | ||
command: [ "python3", "/workspace/api_server.py" ] | ||
extraCmdArgs: [ "--swap-space", "16", | ||
"--disable-log-requests", | ||
"--dtype", "float16", | ||
"--num-scheduler-steps", "1", | ||
"--distributed-executor-backend", "mp" ] | ||
resources: | ||
limits: | ||
amd.com/gpu: "1" | ||
startupProbe: | ||
failureThreshold: 180 | ||
securityContext: | ||
readOnlyRootFilesystem: false | ||
runAsNonRoot: false | ||
runAsUser: 0 | ||
|
||
llm-uservice: | ||
TEXTGEN_BACKEND: vLLM | ||
retryTimeoutSeconds: 720 | ||
LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct" |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here it's ok to keep run with root? Why chatqna is special? https://github.com/opea-project/GenAIInfra/pull/949/files/180f16fb65570968a44663d0490c42ed539862b0#diff-f93551169c7cda08f51cb91abe0a36eb96356b53ace54c5fd940d24d5d4264acR29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a PR to GenAIInfra?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/codetrans/rocm-values.yaml The CodeGen/CodeTrans PRs for GenAIInfra had been merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to the launch from an unprivileged user will be made after this PR is completed - opea-project/GenAIComps#1638