Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Language Model Edge Benchmark Suite: Implementation on KubeEdge-Ianvs #94

Open
nailtu30 opened this issue May 7, 2024 · 3 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@nailtu30
Copy link
Contributor

nailtu30 commented May 7, 2024

What would you like to be added/modified:
A benchmark suite for large language models deployed at the edge using KubeEdge-Ianvs:

  1. Interface Design and Usage Guidelines Document;
  2. Implementation of NLP Large Language Models (LLMs) Benchmark Suite Based on Ianvs
    2.1 Extensive support for mainstream industry benchmark dataset formats such as MMLU, CMMLU, and other open-source datasets.
    2.2 Visualization of the LLMs invocation process, including console output, logging of task execution and monitoring, etc.
  3. Generation of Benchmark Testing Reports Based on Ianvs
    3.1 Test at least three types of LLMs.
    3.2 Present computation results of performance metrics such as ACC, Recall, F1, latency, bandwidth, etc., with metric dimensions referencing the national standard "Artificial Intelligence - Pretrained Models Part 2: Evaluation Metrics and Methods".
  4. (Advanced) Efficient Evaluation: Concurrent execution of tasks, automatic request and result collection.
  5. (Advanced) Integration of Benchmark Testing Suite into the LLMs Training Process.

Why is this needed:
Due to the size of models and data, Large Language Models (LLMs) are often trained in the cloud. Simultaneously, due to concerns regarding commercial confidentiality or user privacy during the usage of LLMs, deploying LLMs on edge devices has gradually become a research hotspot. Quantization techniques for LLMs are enabling edge-side inference; however, the limited resources of edge devices have an impact on the inference latency and accuracy compared to cloud-based training of LLMs. Ianvs aims to conduct edge-side deployment benchmark tests for cloud-trained LLMs utilizing container resource management capabilities and edge-cloud synergy abilities.

Recommended Skills:
TensorFlow/Pytorch, LLMs, Docker

Useful links:
KubeEdge-Ianvs
KubeEdge-Ianvs Benchmark Test Cases
Building Edge-Cloud Synergy Simulation Environment with KubeEdge-Ianvs
Artificial Intelligence - Pretrained Models Part 2: Evaluation Metrics and Methods
Example LLMs Benchmark List
Docker Resource Management

@MooreZheng
Copy link
Collaborator

MooreZheng commented May 9, 2024

If anyone has questions regarding this issue, please feel free to leave a message here. We would also appreciate it if new members could introduce themselves to the community.

@IcyFeather233
Copy link
Contributor

To complete this issue, does it mean that I need to have the corresponding GPU resources to run large models for project debugging?
Additionally, I am aware of an outstanding project called OpenCompass that evaluates LLMs, but they used InternLM's mmengine project. In this issue, is it prefered to write one's own framework rather than importing libraries from other projects?

@nailtu30
Copy link
Contributor Author

To complete this issue, does it mean that I need to have the corresponding GPU resources to run large models for project debugging? Additionally, I am aware of an outstanding project called OpenCompass that evaluates LLMs, but they used InternLM's mmengine project. In this issue, is it prefered to write one's own framework rather than importing libraries from other projects?

Yes, given Cuda's acceleration of neural network training and inference, I think you need basic NIVDIA GPU resources. However, we are thinking about simulating LLMs inference on edge nodes (e.g. smartphones), so I don't think you need the support of e.g. A100 GPU resources. Nowadays, most of the SoC unified memory of a typical smartphone is 8GB or 16GB, so I think any GPU resource in this range can be the environment for simulation. It's normal for LLMs to be OOM in this environment, and that's what we want to explore - what size LLMs are best suited for edge devices.

Thanks for introducing the project; it's a great reference. In my opinion, the Metrics that this project is geared towards are accuracy metrics such as Accuracy, BLEU, etc.. However, we would like to be able to contribute latency, resource usage, and other metrics that edge devices care more about to the edge LLMs inference. Considering the time, I think is it is a safer solution to add the metrics we care about and combine it with Ianvs with reference to the existing framework. Of course, if time permits, we welcome any feasible solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants