Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] 运行速度很慢 #56

Open
sherlcok314159 opened this issue Jan 6, 2025 · 0 comments
Open

[Question] 运行速度很慢 #56

sherlcok314159 opened this issue Jan 6, 2025 · 0 comments

Comments

@sherlcok314159
Copy link

hi~感谢贵组开源了自己的工作,我在 infinite-bench 尝试用 A100 80 G 复现 meta-llama/Meta-Llama-3-8B-Instruct,发现速度很慢,请问是我硬件(比如 CPU / 内存)限制,还是算法本身就不是很快呢?我在论文中并未找到关于「时间 / 空间的具体分析」,只有 "In terms of efficiency, InfLLM achieves a 34% decrease in time consumption while using only 34% of the GPU memory compared to the full-attention models".

read kv_retrieval.jsonl
Pred kv_retrieval
  2%|██▍                                       | 8/500 [07:18<7:22:35, 53.97s/it]

以下是相关配置(取自于仓库,并未做修改)

model:
  type: inf-llm
  path: meta-llama/Meta-Llama-3-8B-Instruct
  block_size: 128
  fattn: false
  n_init: 128
  n_local: 4096
  topk: 16
  repr_topk: 4
  max_cached_block: 32
  exc_block_size: 512
  base: 500000
  distance_scale: 1.0

max_len: 2147483647
chunk_size: 8192
conv_type: llama-3-inst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant