Commit 1c6bada
authored
Chunk prefill cache writes, remove div_i32 from insert_or_update_cache (#289)
Re-implements following PRs for current habana_main:
#102 (Removing div_i32
operations from each layer)
#115 (removing scatter for
reshape&cache in case of prompt)
Accuracy (GSM8K on Llama3.1-8B-Instruct):
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|---------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot_llama| 3|flexible-extract| 8|exact_match|↑ |0.8415|± |0.0101|
| | |strict-match | 8|exact_match|↑ |0.8400|± |0.0101|
I've benchmarked this change on Llama3.1-8B-Instruct and on average,
+2.50% throughput gain (+558.14 tok/s, ~21594 tok/s -> ~22152 tok/s) can
be observed across all prefill buckets on G2, with up to +4.40% (+956.79
tok/s, ~25031 -> ~25988 tok/s) throughput increase in compute-bound
scenarios.1 parent 4c8a6c6 commit 1c6bada
File tree
4 files changed
+33
-11
lines changed- vllm
- attention
- backends
- ops
- worker
4 files changed
+33
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | | - | |
| 9 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
12 | 11 | | |
13 | 12 | | |
14 | 13 | | |
| |||
166 | 165 | | |
167 | 166 | | |
168 | 167 | | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
169 | 173 | | |
170 | 174 | | |
171 | 175 | | |
172 | 176 | | |
173 | 177 | | |
174 | 178 | | |
175 | 179 | | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
183 | 184 | | |
184 | 185 | | |
185 | 186 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
| 22 | + | |
21 | 23 | | |
22 | 24 | | |
23 | 25 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
245 | 245 | | |
246 | 246 | | |
247 | 247 | | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
248 | 259 | | |
249 | 260 | | |
250 | 261 | | |
| |||
890 | 901 | | |
891 | 902 | | |
892 | 903 | | |
| 904 | + | |
| 905 | + | |
893 | 906 | | |
894 | 907 | | |
895 | 908 | | |
896 | 909 | | |
897 | 910 | | |
| 911 | + | |
| 912 | + | |
898 | 913 | | |
899 | 914 | | |
900 | 915 | | |
| |||
1044 | 1059 | | |
1045 | 1060 | | |
1046 | 1061 | | |
| 1062 | + | |
| 1063 | + | |
1047 | 1064 | | |
1048 | 1065 | | |
1049 | 1066 | | |
1050 | 1067 | | |
1051 | 1068 | | |
| 1069 | + | |
| 1070 | + | |
1052 | 1071 | | |
1053 | 1072 | | |
1054 | 1073 | | |
| |||
1266 | 1285 | | |
1267 | 1286 | | |
1268 | 1287 | | |
1269 | | - | |
| 1288 | + | |
| 1289 | + | |
1270 | 1290 | | |
1271 | 1291 | | |
1272 | 1292 | | |
| |||
0 commit comments