Large Language Model Edge Benchmark Suite: Implementation on KubeEdge-Ianvs #94

nailtu30 · 2024-05-07T11:50:45Z

What would you like to be added/modified:
A benchmark suite for large language models deployed at the edge using KubeEdge-Ianvs:

Interface Design and Usage Guidelines Document;
Implementation of NLP Large Language Models (LLMs) Benchmark Suite Based on Ianvs
2.1 Extensive support for mainstream industry benchmark dataset formats such as MMLU, CMMLU, and other open-source datasets.
2.2 Visualization of the LLMs invocation process, including console output, logging of task execution and monitoring, etc.
Generation of Benchmark Testing Reports Based on Ianvs
3.1 Test at least three types of LLMs.
3.2 Present computation results of performance metrics such as ACC, Recall, F1, latency, bandwidth, etc., with metric dimensions referencing the national standard "Artificial Intelligence - Pretrained Models Part 2: Evaluation Metrics and Methods".
(Advanced) Efficient Evaluation: Concurrent execution of tasks, automatic request and result collection.
(Advanced) Integration of Benchmark Testing Suite into the LLMs Training Process.

Why is this needed:
Due to the size of models and data, Large Language Models (LLMs) are often trained in the cloud. Simultaneously, due to concerns regarding commercial confidentiality or user privacy during the usage of LLMs, deploying LLMs on edge devices has gradually become a research hotspot. Quantization techniques for LLMs are enabling edge-side inference; however, the limited resources of edge devices have an impact on the inference latency and accuracy compared to cloud-based training of LLMs. Ianvs aims to conduct edge-side deployment benchmark tests for cloud-trained LLMs utilizing container resource management capabilities and edge-cloud synergy abilities.

Recommended Skills:
TensorFlow/Pytorch, LLMs, Docker

Useful links:
KubeEdge-Ianvs
KubeEdge-Ianvs Benchmark Test Cases
Building Edge-Cloud Synergy Simulation Environment with KubeEdge-Ianvs
Artificial Intelligence - Pretrained Models Part 2: Evaluation Metrics and Methods
Example LLMs Benchmark List
Docker Resource Management

MooreZheng · 2024-05-09T03:30:39Z

If anyone has questions regarding this issue, please feel free to leave a message here. We would also appreciate it if new members could introduce themselves to the community.

IcyFeather233 · 2024-05-23T15:20:06Z

To complete this issue, does it mean that I need to have the corresponding GPU resources to run large models for project debugging?
Additionally, I am aware of an outstanding project called OpenCompass that evaluates LLMs, but they used InternLM's mmengine project. In this issue, is it prefered to write one's own framework rather than importing libraries from other projects?

nailtu30 · 2024-05-24T00:13:30Z

To complete this issue, does it mean that I need to have the corresponding GPU resources to run large models for project debugging? Additionally, I am aware of an outstanding project called OpenCompass that evaluates LLMs, but they used InternLM's mmengine project. In this issue, is it prefered to write one's own framework rather than importing libraries from other projects?

Yes, given Cuda's acceleration of neural network training and inference, I think you need basic NIVDIA GPU resources. However, we are thinking about simulating LLMs inference on edge nodes (e.g. smartphones), so I don't think you need the support of e.g. A100 GPU resources. Nowadays, most of the SoC unified memory of a typical smartphone is 8GB or 16GB, so I think any GPU resource in this range can be the environment for simulation. It's normal for LLMs to be OOM in this environment, and that's what we want to explore - what size LLMs are best suited for edge devices.

Thanks for introducing the project; it's a great reference. In my opinion, the Metrics that this project is geared towards are accuracy metrics such as Accuracy, BLEU, etc.. However, we would like to be able to contribute latency, resource usage, and other metrics that edge devices care more about to the edge LLMs inference. Considering the time, I think is it is a safer solution to add the metrics we care about and combine it with Ianvs with reference to the existing framework. Of course, if time permits, we welcome any feasible solution.

XueSongTap mentioned this issue Jul 17, 2024

Proposal for Large Language Model Edge Benchmark Suite: Implementatio… #121

Closed

XueSongTap mentioned this issue Jul 24, 2024

add Proposal for Large Language Model Edge Benchmark Suite: Implement… #127

Merged

MooreZheng added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Language Model Edge Benchmark Suite: Implementation on KubeEdge-Ianvs #94

Large Language Model Edge Benchmark Suite: Implementation on KubeEdge-Ianvs #94

nailtu30 commented May 7, 2024

MooreZheng commented May 9, 2024 •

edited

Loading

IcyFeather233 commented May 23, 2024

nailtu30 commented May 24, 2024

Large Language Model Edge Benchmark Suite: Implementation on KubeEdge-Ianvs #94

Large Language Model Edge Benchmark Suite: Implementation on KubeEdge-Ianvs #94

Comments

nailtu30 commented May 7, 2024

MooreZheng commented May 9, 2024 • edited Loading

IcyFeather233 commented May 23, 2024

nailtu30 commented May 24, 2024

MooreZheng commented May 9, 2024 •

edited

Loading