From 947936ed7b9a294ba318d7b326c7522faa79fa4b Mon Sep 17 00:00:00 2001 From: "chen, suyue" Date: Fri, 6 Sep 2024 09:55:55 +0800 Subject: [PATCH] Update v0.9 RAG release data (#747) * run both xeon and gaudi when both hardware detect Signed-off-by: chensuyue * add v0.9 RAG release data Signed-off-by: chensuyue * update system summary Signed-off-by: chensuyue * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: chensuyue Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- .github/workflows/_get-test-matrix.yml | 2 +- ChatQnA/benchmark/README.md | 20 ++++++----- opea_release_data.md | 49 ++++++++++++++++++++++++++ 3 files changed, 62 insertions(+), 9 deletions(-) create mode 100644 opea_release_data.md diff --git a/.github/workflows/_get-test-matrix.yml b/.github/workflows/_get-test-matrix.yml index 999767f13..677fdeac2 100644 --- a/.github/workflows/_get-test-matrix.yml +++ b/.github/workflows/_get-test-matrix.yml @@ -67,7 +67,7 @@ jobs: run_hardware="" if [ $(printf '%s\n' "${changed_files[@]}" | grep ${example} | grep -c gaudi) != 0 ]; then run_hardware="gaudi"; fi if [ $(printf '%s\n' "${changed_files[@]}" | grep ${example} | grep -c xeon) != 0 ]; then run_hardware="xeon ${run_hardware}"; fi - if [ "$run_hardware" == "" ]; then run_hardware="gaudi"; fi + if [ "$run_hardware" == "" ]; then run_hardware="xeon gaudi"; fi for hw in ${run_hardware}; do if [ "$hw" == "gaudi" ] && [ "${{ inputs.gaudi_server_label }}" != "" ]; then run_matrix="${run_matrix}{\"example\":\"${example}\",\"hardware\":\"${{ inputs.gaudi_server_label }}\"}," diff --git a/ChatQnA/benchmark/README.md b/ChatQnA/benchmark/README.md index 4963e53de..52c7764b8 100644 --- a/ChatQnA/benchmark/README.md +++ b/ChatQnA/benchmark/README.md @@ -133,11 +133,11 @@ kubectl label nodes k8s-worker1 node-type=chatqna-opea #### 2. Install ChatQnA -Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/single_gaudi) and apply to K8s. +Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/single_gaudi) and apply to K8s. ```bash # on k8s-master node -cd GenAIExamples/ChatQnA/benchmark/single_gaudi +cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/single_gaudi kubectl apply -f . ``` @@ -199,7 +199,7 @@ All the test results will come to this folder `/home/sdp/benchmark_output/node_1 ```bash # on k8s-master node -cd GenAIExamples/ChatQnA/benchmark/single_gaudi +cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/single_gaudi kubectl delete -f . kubectl label nodes k8s-worker1 node-type- ``` @@ -216,11 +216,11 @@ kubectl label nodes k8s-worker1 k8s-worker2 node-type=chatqna-opea #### 2. Install ChatQnA -Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/two_gaudi) and apply to K8s. +Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/two_gaudi) and apply to K8s. ```bash # on k8s-master node -cd GenAIExamples/ChatQnA/benchmark/two_gaudi +cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/two_gaudi kubectl apply -f . ``` @@ -265,11 +265,11 @@ kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type=cha #### 2. Install ChatQnA -Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/four_gaudi) and apply to K8s. +Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/four_gaudi) and apply to K8s. ```bash # on k8s-master node -cd GenAIExamples/ChatQnA/benchmark/four_gaudi +cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/four_gaudi kubectl apply -f . ``` @@ -298,7 +298,11 @@ All the test results will come to this folder `/home/sdp/benchmark_output/node_4 ```bash # on k8s-master node -cd GenAIExamples/ChatQnA/benchmark/single_gaudi +cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/single_gaudi kubectl delete -f . kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type- ``` + +#### 6. Results + +Check OOB performance data [here](/opea_release_data.md#chatqna), tuned performance data will be released soon. diff --git a/opea_release_data.md b/opea_release_data.md new file mode 100644 index 000000000..f44b80f10 --- /dev/null +++ b/opea_release_data.md @@ -0,0 +1,49 @@ +# OPEA Release Data + +This page shows the benchmark data of GenAIExamples. More data for different examples will be submitted in the future release. + +## ChatQnA + +| **Docker Images for Test** | +| ----------------------------------------------------- | +| opea/embedding-tei:v0.9 | +| ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 | +| opea/llm-tgi:v0.9 | +| ghcr.io/huggingface/tgi-gaudi:2.0.1 | +| opea/dataprep-redis:v0.9 | +| redis/redis-stack:7.2.0-v9 | +| opea/reranking-tei:v0.9 | +| opea/tei-gaudi:v0.9 | +| opea/retriever-redis:v0.9 | +| opea/chatqna:v0.9 | + +System Summary: +1-node, 2x Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz, 40 cores, 270W TDP, HT On, Turbo On, NUMA 2, Integrated Accelerators Available [used]: DLB 0 [0], DSA 0 [0], IAA 0 [0], QAT 0 [0], Total Memory 1024GB (32x32GB DDR4 3200 MT/s [3200 MT/s]), BIOS ETM02, microcode 0xd0003b9, 8x Habana Labs Ltd., 4x MT28800 Family [ConnectX-5 Ex], 4x 7T INTEL SSDPF2KX076TZ, 2x 894.3G SAMSUNG MZ1L2960HCJR-00A07, Ubuntu 22.04.3 LTS, 5.15.0-92-generic. Software: WORKLOAD+VERSION, COMPILER, LIBRARIES, OTHER_SW. Test by Intel as of 08/20/24. + +### Performance Data + +| 1Node E2E Performance (Sec) | Gaudi nodes | Concurrency | Input | Output | Average Latency | P90 Total latency | +| :-------------------------: | :---------: | :---------: | :---: | :----: | :-------------: | :---------------: | +| OOB w/o Reranking | 1 | 128 | 128 | 128 | 5.597 | 7.59 | +| OOB w/ Reranking | 1 | 128 | 128 | 128 | 6.003 | 8.123 | + +| 2Nodes E2E Performance (Sec) | Gaudi nodes | Concurrency | Input | Output | Average Latency | P90 Total latency | +| :--------------------------: | :---------: | :---------: | :---: | :----: | :-------------: | :---------------: | +| OOB w/o Reranking | 2 | 256 | 128 | 128 | 7.05 | 9.122 | +| OOB w/ Reranking | 2 | 256 | 128 | 128 | 7.26 | 9.239 | + +| 4Nodes E2E Performance (Sec) | Gaudi nodes | Concurrency | Input | Output | Average Latency | P90 Total latency | +| :--------------------------: | :---------: | :---------: | :---: | :----: | :-------------: | :---------------: | +| OOB w/o Reranking | 4 | 512 | 128 | 128 | 16.293 | 21.169 | +| OOB w/ Reranking | 4 | 512 | 128 | 128 | 17.22 | 21.942 | + +Go to Benchmark [README](./ChatQnA/benchmark/README.md) for reproduce steps, tuned performance data will be released soon. + +### Accuracy Data + +| Test Case | Hits@10 | Hits@4 | MAP@10 | MRR@10 | +| :---------------------: | :-----: | :----: | :----: | :----: | +| Retrieval w/o Reranking | 66.16% | 49.80% | 17.62% | 39.75% | +| Retrieval w/ Reranking | 72.28% | 63.24% | 24.97% | 56.79% | + +Go to Accuracy [README](https://github.com/opea-project/GenAIEval/tree/main/evals/evaluation/rag_eval#multihop-english-dataset) for reproduce steps.