-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large Language Model Edge Benchmark Suite: Implementation on KubeEdge-Ianvs #94
Comments
If anyone has questions regarding this issue, please feel free to leave a message here. We would also appreciate it if new members could introduce themselves to the community. |
To complete this issue, does it mean that I need to have the corresponding GPU resources to run large models for project debugging? |
Yes, given Cuda's acceleration of neural network training and inference, I think you need basic NIVDIA GPU resources. However, we are thinking about simulating LLMs inference on edge nodes (e.g. smartphones), so I don't think you need the support of e.g. A100 GPU resources. Nowadays, most of the SoC unified memory of a typical smartphone is 8GB or 16GB, so I think any GPU resource in this range can be the environment for simulation. It's normal for LLMs to be OOM in this environment, and that's what we want to explore - what size LLMs are best suited for edge devices. Thanks for introducing the project; it's a great reference. In my opinion, the Metrics that this project is geared towards are accuracy metrics such as Accuracy, BLEU, etc.. However, we would like to be able to contribute latency, resource usage, and other metrics that edge devices care more about to the edge LLMs inference. Considering the time, I think is it is a safer solution to add the metrics we care about and combine it with Ianvs with reference to the existing framework. Of course, if time permits, we welcome any feasible solution. |
What would you like to be added/modified:
A benchmark suite for large language models deployed at the edge using KubeEdge-Ianvs:
2.1 Extensive support for mainstream industry benchmark dataset formats such as MMLU, CMMLU, and other open-source datasets.
2.2 Visualization of the LLMs invocation process, including console output, logging of task execution and monitoring, etc.
3.1 Test at least three types of LLMs.
3.2 Present computation results of performance metrics such as ACC, Recall, F1, latency, bandwidth, etc., with metric dimensions referencing the national standard "Artificial Intelligence - Pretrained Models Part 2: Evaluation Metrics and Methods".
Why is this needed:
Due to the size of models and data, Large Language Models (LLMs) are often trained in the cloud. Simultaneously, due to concerns regarding commercial confidentiality or user privacy during the usage of LLMs, deploying LLMs on edge devices has gradually become a research hotspot. Quantization techniques for LLMs are enabling edge-side inference; however, the limited resources of edge devices have an impact on the inference latency and accuracy compared to cloud-based training of LLMs. Ianvs aims to conduct edge-side deployment benchmark tests for cloud-trained LLMs utilizing container resource management capabilities and edge-cloud synergy abilities.
Recommended Skills:
TensorFlow/Pytorch, LLMs, Docker
Useful links:
KubeEdge-Ianvs
KubeEdge-Ianvs Benchmark Test Cases
Building Edge-Cloud Synergy Simulation Environment with KubeEdge-Ianvs
Artificial Intelligence - Pretrained Models Part 2: Evaluation Metrics and Methods
Example LLMs Benchmark List
Docker Resource Management
The text was updated successfully, but these errors were encountered: