Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include convert latency in bench_append_paged_kv_cache #590

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

abcdabcd987
Copy link
Member

model: l1b      seqlens: [1, 1, 1, 1, 1, 1, 1, 1]                 convert: 45us 1layer:  7us 16layers: 151us throughput:    4.936GB/s
model: l1b      seqlens: [4993, 1, 1, 1, 1, 1, 1, 1]              convert: 42us 1layer: 14us 16layers: 271us throughput: 1434.769GB/s
model: l1b      seqlens: [5000]                                   convert: 44us 1layer: 14us 16layers: 272us throughput: 1438.581GB/s
model: l1b      seqlens: [625, 625, 625, 625, 625, 625, 625, 625] convert: 46us 1layer: 14us 16layers: 274us throughput: 1440.357GB/s
---
model: l3b      seqlens: [1, 1, 1, 1, 1, 1, 1, 1]                 convert: 42us 1layer:  7us 28layers: 226us throughput:    9.946GB/s
model: l3b      seqlens: [4993, 1, 1, 1, 1, 1, 1, 1]              convert: 43us 1layer: 22us 28layers: 647us throughput: 1896.687GB/s
model: l3b      seqlens: [5000]                                   convert: 42us 1layer: 22us 28layers: 646us throughput: 1898.796GB/s
model: l3b      seqlens: [625, 625, 625, 625, 625, 625, 625, 625] convert: 41us 1layer: 22us 28layers: 648us throughput: 1890.115GB/s
---
model: l8b      seqlens: [1, 1, 1, 1, 1, 1, 1, 1]                 convert: 41us 1layer:  7us 32layers: 252us throughput:    9.940GB/s
model: l8b      seqlens: [4993, 1, 1, 1, 1, 1, 1, 1]              convert: 42us 1layer: 21us 32layers: 730us throughput: 1905.826GB/s
model: l8b      seqlens: [5000]                                   convert: 41us 1layer: 22us 32layers: 729us throughput: 1903.697GB/s
model: l8b      seqlens: [625, 625, 625, 625, 625, 625, 625, 625] convert: 47us 1layer: 22us 32layers: 737us throughput: 1899.630GB/s
---
model: l70b-tp8 seqlens: [1, 1, 1, 1, 1, 1, 1, 1]                 convert: 42us 1layer:  6us 80layers: 552us throughput:    1.283GB/s
model: l70b-tp8 seqlens: [4993, 1, 1, 1, 1, 1, 1, 1]              convert: 41us 1layer:  9us 80layers: 800us throughput:  539.484GB/s
model: l70b-tp8 seqlens: [5000]                                   convert: 41us 1layer:  9us 80layers: 788us throughput:  548.648GB/s
model: l70b-tp8 seqlens: [625, 625, 625, 625, 625, 625, 625, 625] convert: 41us 1layer: 10us 80layers: 803us throughput:  537.731GB/s

@abcdabcd987 abcdabcd987 requested a review from yzh119 November 6, 2024 22:05
@yzh119 yzh119 merged commit d7300c4 into flashinfer-ai:main Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants