Add AVX2 implementation of quantize_row_q4_1 #515

slaren · 2023-03-26T00:06:42Z

Largely based on the AVX2 implementation of quantize_row_q4_0.

Run on (16 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.17, 1.04, 1.50
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
BM_quantize_row_q4_1_ref      12845 ns        12845 ns        54677
BM_quantize_row_q4_1_avx       1360 ns         1360 ns       519134

`🤖 Generated by Copilot at ae08d8e`

Summary

🚀🐛♻️

Improved matrix quantization with AVX2 and bug fixes. Added a new function quantize_row_q4_1 that uses AVX2 instructions to speed up the quantization of a matrix row using 4-bit factors. Renamed and fixed the original function quantize_row_q4_1_reference. Updated ggml_quantize_q4_1 to use the appropriate function depending on the CPU capabilities.

We're sailing on the matrix sea, with quantize_row_q4_1
We've fixed a bug and gained some speed, with quantize_row_q4_1
So heave away, me hearties, heave away with glee
We'll raise the sail and catch the wind, with quantize_row_q4_1

Walkthrough

Rename quantize_row_q4_1 to quantize_row_q4_1_reference to avoid confusion with the new AVX2-optimized function (link)
Add quantize_row_q4_1 that uses AVX2 instructions to speed up the quantization algorithm for 4-bit factors (link)
Replace quantize_row_q4_1 with quantize_row_q4_1_reference in ggml_quantize_q4_1 to fix a bug and avoid unnecessary computation (link)

slaren · 2023-03-26T12:08:13Z

~~Perplexity after this change: 6.3029 (7B q4_1)~~

Full run output

./perplexity -m ./models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12
main: seed = 1679789188
llama_model_load: loading model from './models/7B/ggml-model-q4_1.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 3
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml ctx size = 5076.59 MB
llama_model_load: mem required = 6868.59 MB (+ 1026.00 MB per state)
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_1.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4820.52 MB / num tensors = 291
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
59.76 seconds per pass - ETA 10.87 hours
[1]4.6106,[2]5.0788,[3]5.9975,[4]6.5975,[5]6.6904,[6]6.6597,[7]6.8588,[8]6.9568,[9]7.2826,[10]7.5468,[11]7.7747,[12]7.8253,[13]7.7676,[14]7.8484,[15]8.1075,[16]7.6907,[17]7.5501,[18]7.4949,[19]7.1007,[20]7.0787,[21]6.9834,[22]6.8058,[23]6.7749,[24]6.6862,[25]6.6797,[26]6.5124,[27]6.3226,[28]6.2171,[29]6.1203,[30]5.9528,[31]5.9206,[32]5.9367,[33]5.8739,[34]5.9077,[35]5.9296,[36]5.9729,[37]5.9745,[38]5.9824,[39]6.0189,[40]6.0655,[41]6.0861,[42]6.1289,[43]6.0895,[44]6.1479,[45]6.1487,[46]6.1206,[47]6.1384,[48]6.1157,[49]6.1142,[50]6.0721,[51]6.0641,[52]6.0524,[53]6.1027,[54]6.0832,[55]6.0599,[56]6.0930,[57]6.1117,[58]6.1306,[59]6.1457,[60]6.1885,[61]6.1751,[62]6.2314,[63]6.2605,[64]6.2724,[65]6.3186,[66]6.3298,[67]6.3479,[68]6.3603,[69]6.3840,[70]6.4122,[71]6.4332,[72]6.4649,[73]6.5236,[74]6.5269,[75]6.5418,[76]6.5553,[77]6.5675,[78]6.5542,[79]6.5846,[80]6.5776,[81]6.6003,[82]6.6076,[83]6.5525,[84]6.5364,[85]6.5249,[86]6.5017,[87]6.4426,[88]6.4180,[89]6.3960,[90]6.3812,[91]6.4069,[92]6.4003,[93]6.3985,[94]6.3942,[95]6.4252,[96]6.4245,[97]6.4218,[98]6.4135,[99]6.3976,[100]6.3935,[101]6.4180,[102]6.4125,[103]6.4327,[104]6.4431,[105]6.4418,[106]6.4598,[107]6.4601,[108]6.4743,[109]6.4664,[110]6.4615,[111]6.4824,[112]6.5048,[113]6.5083,[114]6.5039,[115]6.5092,[116]6.4988,[117]6.5055,[118]6.5340,[119]6.5562,[120]6.5922,[121]6.6074,[122]6.6315,[123]6.6712,[124]6.6899,[125]6.6808,[126]6.7218,[127]6.7585,[128]6.7905,[129]6.7743,[130]6.7853,[131]6.7815,[132]6.7721,[133]6.7581,[134]6.7678,[135]6.7641,[136]6.7522,[137]6.7453,[138]6.7293,[139]6.7193,[140]6.7153,[141]6.6870,[142]6.6843,[143]6.6561,[144]6.6345,[145]6.6274,[146]6.6161,[147]6.6203,[148]6.6217,[149]6.6186,[150]6.6148,[151]6.6188,[152]6.6069,[153]6.5908,[154]6.5822,[155]6.5901,[156]6.5849,[157]6.6010,[158]6.6043,[159]6.6100,[160]6.6120,[161]6.6245,[162]6.5950,[163]6.5826,[164]6.5577,[165]6.5254,[166]6.4974,[167]6.4577,[168]6.4266,[169]6.4141,[170]6.4023,[171]6.3752,[172]6.3569,[173]6.3402,[174]6.3098,[175]6.2891,[176]6.2767,[177]6.2560,[178]6.2328,[179]6.2153,[180]6.2052,[181]6.1829,[182]6.1645,[183]6.1500,[184]6.1489,[185]6.1412,[186]6.1421,[187]6.1483,[188]6.1446,[189]6.1635,[190]6.1652,[191]6.1868,[192]6.2028,[193]6.2202,[194]6.2317,[195]6.2534,[196]6.2696,[197]6.2906,[198]6.3066,[199]6.3100,[200]6.3153,[201]6.3098,[202]6.3299,[203]6.3383,[204]6.3390,[205]6.3506,[206]6.3581,[207]6.3548,[208]6.3644,[209]6.3688,[210]6.3730,[211]6.3837,[212]6.3928,[213]6.4027,[214]6.4066,[215]6.4093,[216]6.4233,[217]6.4425,[218]6.4568,[219]6.4577,[220]6.4536,[221]6.4471,[222]6.4452,[223]6.4342,[224]6.4271,[225]6.4234,[226]6.4442,[227]6.4537,[228]6.4598,[229]6.4659,[230]6.4627,[231]6.4787,[232]6.4669,[233]6.4494,[234]6.4334,[235]6.4167,[236]6.4103,[237]6.4003,[238]6.4026,[239]6.3865,[240]6.3748,[241]6.3770,[242]6.3800,[243]6.3775,[244]6.3658,[245]6.3624,[246]6.3510,[247]6.3379,[248]6.3295,[249]6.3259,[250]6.3301,[251]6.3234,[252]6.3190,[253]6.3097,[254]6.3042,[255]6.2931,[256]6.2743,[257]6.2611,[258]6.2515,[259]6.2487,[260]6.2395,[261]6.2342,[262]6.2286,[263]6.2225,[264]6.2029,[265]6.2027,[266]6.2019,[267]6.1946,[268]6.2032,[269]6.2024,[270]6.2017,[271]6.2097,[272]6.2133,[273]6.2134,[274]6.2153,[275]6.2246,[276]6.2307,[277]6.2459,[278]6.2562,[279]6.2648,[280]6.2675,[281]6.2778,[282]6.2834,[283]6.2984,[284]6.3061,[285]6.3141,[286]6.3268,[287]6.3259,[288]6.3326,[289]6.3232,[290]6.3067,[291]6.2908,[292]6.2755,[293]6.2622,[294]6.2644,[295]6.2637,[296]6.2690,[297]6.2683,[298]6.2719,[299]6.2692,[300]6.2579,[301]6.2574,[302]6.2499,[303]6.2405,[304]6.2314,[305]6.2283,[306]6.2155,[307]6.2175,[308]6.2203,[309]6.2037,[310]6.1974,[311]6.1913,[312]6.1939,[313]6.1880,[314]6.1866,[315]6.1705,[316]6.1662,[317]6.1495,[318]6.1284,[319]6.1410,[320]6.1533,[321]6.1573,[322]6.1529,[323]6.1458,[324]6.1428,[325]6.1538,[326]6.1538,[327]6.1556,[328]6.1588,[329]6.1647,[330]6.1678,[331]6.1801,[332]6.1770,[333]6.1847,[334]6.1791,[335]6.1724,[336]6.1758,[337]6.1731,[338]6.1724,[339]6.1668,[340]6.1625,[341]6.1702,[342]6.1732,[343]6.1779,[344]6.1778,[345]6.1777,[346]6.1743,[347]6.1782,[348]6.1819,[349]6.1841,[350]6.1811,[351]6.1819,[352]6.1818,[353]6.1753,[354]6.1764,[355]6.1816,[356]6.1851,[357]6.1819,[358]6.1914,[359]6.1938,[360]6.1904,[361]6.1899,[362]6.1967,[363]6.2078,[364]6.2142,[365]6.2191,[366]6.2206,[367]6.2292,[368]6.2263,[369]6.2275,[370]6.2296,[371]6.2241,[372]6.2291,[373]6.2340,[374]6.2320,[375]6.2318,[376]6.2388,[377]6.2338,[378]6.2364,[379]6.2424,[380]6.2345,[381]6.2309,[382]6.2264,[383]6.2257,[384]6.2252,[385]6.2242,[386]6.2241,[387]6.2246,[388]6.2207,[389]6.2153,[390]6.2088,[391]6.2012,[392]6.1970,[393]6.1955,[394]6.1984,[395]6.1970,[396]6.1894,[397]6.1965,[398]6.2007,[399]6.2081,[400]6.2074,[401]6.2086,[402]6.2097,[403]6.2119,[404]6.2185,[405]6.2100,[406]6.2072,[407]6.2068,[408]6.2089,[409]6.2210,[410]6.2320,[411]6.2440,[412]6.2604,[413]6.2715,[414]6.2798,[415]6.2853,[416]6.2934,[417]6.3062,[418]6.3097,[419]6.3173,[420]6.3268,[421]6.3386,[422]6.3426,[423]6.3496,[424]6.3605,[425]6.3698,[426]6.3766,[427]6.3812,[428]6.3896,[429]6.3951,[430]6.4031,[431]6.4174,[432]6.4216,[433]6.4206,[434]6.4158,[435]6.4169,[436]6.4193,[437]6.4295,[438]6.4376,[439]6.4342,[440]6.4329,[441]6.4279,[442]6.4259,[443]6.4269,[444]6.4272,[445]6.4251,[446]6.4276,[447]6.4308,[448]6.4350,[449]6.4326,[450]6.4330,[451]6.4288,[452]6.4174,[453]6.4091,[454]6.4034,[455]6.4041,[456]6.4093,[457]6.4114,[458]6.4091,[459]6.4099,[460]6.4188,[461]6.4158,[462]6.4146,[463]6.4190,[464]6.4176,[465]6.4150,[466]6.4073,[467]6.4084,[468]6.4087,[469]6.4110,[470]6.4122,[471]6.4075,[472]6.4128,[473]6.4073,[474]6.4090,[475]6.4030,[476]6.4050,[477]6.3981,[478]6.3975,[479]6.4043,[480]6.4091,[481]6.4106,[482]6.4061,[483]6.4018,[484]6.4039,[485]6.4023,[486]6.3960,[487]6.3958,[488]6.3938,[489]6.3888,[490]6.3865,[491]6.3837,[492]6.3779,[493]6.3749,[494]6.3730,[495]6.3730,[496]6.3694,[497]6.3640,[498]6.3623,[499]6.3575,[500]6.3478,[501]6.3413,[502]6.3414,[503]6.3408,[504]6.3317,[505]6.3338,[506]6.3346,[507]6.3289,[508]6.3250,[509]6.3244,[510]6.3283,[511]6.3331,[512]6.3367,[513]6.3385,[514]6.3456,[515]6.3401,[516]6.3393,[517]6.3400,[518]6.3394,[519]6.3428,[520]6.3452,[521]6.3468,[522]6.3495,[523]6.3503,[524]6.3561,[525]6.3596,[526]6.3609,[527]6.3625,[528]6.3572,[529]6.3580,[530]6.3525,[531]6.3508,[532]6.3558,[533]6.3583,[534]6.3568,[535]6.3593,[536]6.3540,[537]6.3515,[538]6.3567,[539]6.3577,[540]6.3613,[541]6.3616,[542]6.3618,[543]6.3638,[544]6.3648,[545]6.3628,[546]6.3637,[547]6.3594,[548]6.3542,[549]6.3538,[550]6.3508,[551]6.3470,[552]6.3444,[553]6.3407,[554]6.3382,[555]6.3349,[556]6.3347,[557]6.3374,[558]6.3336,[559]6.3333,[560]6.3330,[561]6.3337,[562]6.3311,[563]6.3309,[564]6.3358,[565]6.3379,[566]6.3379,[567]6.3359,[568]6.3363,[569]6.3346,[570]6.3374,[571]6.3379,[572]6.3383,[573]6.3377,[574]6.3339,[575]6.3335,[576]6.3333,[577]6.3318,[578]6.3293,[579]6.3297,[580]6.3232,[581]6.3195,[582]6.3186,[583]6.3193,[584]6.3193,[585]6.3115,[586]6.3044,[587]6.3051,[588]6.3098,[589]6.3154,[590]6.3185,[591]6.3205,[592]6.3193,[593]6.3155,[594]6.3164,[595]6.3139,[596]6.3174,[597]6.3151,[598]6.3125,[599]6.3149,[600]6.3147,[601]6.3135,[602]6.3156,[603]6.3181,[604]6.3189,[605]6.3228,[606]6.3249,[607]6.3236,[608]6.3199,[609]6.3203,[610]6.3241,[611]6.3224,[612]6.3250,[613]6.3212,[614]6.3164,[615]6.3086,[616]6.3114,[617]6.3050,[618]6.2999,[619]6.2940,[620]6.2796,[621]6.2725,[622]6.2708,[623]6.2723,[624]6.2730,[625]6.2730,[626]6.2722,[627]6.2746,[628]6.2748,[629]6.2742,[630]6.2773,[631]6.2831,[632]6.2890,[633]6.2872,[634]6.2909,[635]6.2914,[636]6.2878,[637]6.2844,[638]6.2874,[639]6.2841,[640]6.2853,[641]6.2855,[642]6.2922,[643]6.2940,[644]6.2952,[645]6.2937,[646]6.2982,[647]6.2944,[648]6.2957,[649]6.2957,[650]6.2997,[651]6.3050,[652]6.3060,[653]6.3104,[654]6.3037,[655]6.3029,

Please disregard this result, I was using a broken model. I am re-running the perplexity computation now.

Green-Sky · 2023-03-26T12:43:18Z

running on latest master, it starts out like this for me:

65.24 seconds per pass - ETA 11.87 hours
[1]4.4948,[2]4.9721,[3]5.8697,[4]6.4772,[5]6.6286,

your branch on my machine:

46.66 seconds per pass - ETA 8.49 hours
[1]4.5903,[2]5.0429,[3]5.9618,[4]6.5779,[5]6.6896,

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

$ make
I llama.cpp build info:
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mavx -mavx2 -mfma -mf16c -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:
I CC:       cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
I CXX:      g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

note: i tried to match your settings ./perplexity -m ./models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12

slaren · 2023-03-26T16:52:13Z

@Green-Sky does your system-info have the same flags as mine? I wonder if there is a different path somewhere that may cause the difference. I get the same results even after rebasing to current master.

On master, my result is also different than yours:

62.33 seconds per pass - ETA 11.34 hours
[1]4.5381,[2]5.0059,[3]5.9007,

Just in case my model is broken somehow, this is the SHA256 hash:

0733914c21bc6beb432d0845f9c0abc6d12325447e64e20134c5fca72e039b79  models/7B/ggml-model-q4_1.bin

Can you verify if yours is the same?

Green-Sky · 2023-03-26T17:18:20Z

oh wow, it's different

21a45d7b56e495d3d1ec2615b779241b1285a6f8d17ba6e5d5c3db00c7d2ca2f  models/7B/ggml-model-q4_1.bin

I regenerated to double check, and same hash again.

i also checked the src

700df0d3013b703a806d2ae7f1bfb8e59814e3d06ae78be0c66368a50059f33d  models/7B/consolidated.00.pth

which matches the SHA256SUMS file

slaren · 2023-03-26T17:33:22Z

@Green-Sky It looks like the problem was my model, after re-converting and re-quantizing the model I get the same sum and perplexity as yours. I will re-run the perplexity computation in case there is a significant difference. Thanks for checking!

anzz1 · 2023-03-26T17:33:40Z

If I understood the results correctly, @Green-Sky shows major increase in speed with a slight decrease in accuracy?
In addition to comparing cpuid flags, shouldn't you need to compare your gcc versions too since the resulting binary code can vary depending on that? I'd think that the variations caused by compiler optimizations has a much greater effect in determinism than the processor's branch predictions and whatnots?

A sidepoint related to this:

edit: -snip- as it doesn't really belong here, I made it a discussion topic:

Regarding detection and use of processor feature sets #535

Green-Sky · 2023-03-26T17:38:41Z

updated my previous post with system_info and make command.

shows major increase in speed with a slight decrease in accuracy?

yes, however the perplexity is very unstable in the beginning. so a full run would be necessary.

slaren · 2023-03-27T10:15:18Z

Perplexity: 6.3056 (7B q4_1)

Full run output

make && ./perplexity -m ./models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12
I llama.cpp build info:
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mavx -mavx2 -mfma -mf16c -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

make: Nothing to be done for 'default'.
main: seed = 1679851786
llama_model_load: loading model from './models/7B/ggml-model-q4_1.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 3
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml ctx size = 5076.59 MB
llama_model_load: mem required  = 6868.59 MB (+ 1026.00 MB per state)
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_1.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4820.52 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
60.09 seconds per pass - ETA 10.93 hours
[1]4.5903,[2]5.0429,[3]5.9618,[4]6.5779,[5]6.6896,[6]6.6522,[7]6.8727,[8]6.9693,[9]7.2955,[10]7.5687,[11]7.7932,[12]7.8358,[13]7.7806,[14]7.8659,[15]8.1173,[16]7.7044,[17]7.5588,[18]7.5070,[19]7.1124,[20]7.0921,[21]6.9944,[22]6.8186,[23]6.7866,[24]6.6973,[25]6.6939,[26]6.5230,[27]6.3321,[28]6.2255,[29]6.1306,[30]5.9621,[31]5.9297,[32]5.9448,[33]5.8841,[34]5.9205,[35]5.9420,[36]5.9854,[37]5.9870,[38]5.9925,[39]6.0279,[40]6.0754,[41]6.0956,[42]6.1372,[43]6.0987,[44]6.1591,[45]6.1605,[46]6.1327,[47]6.1492,[48]6.1271,[49]6.1264,[50]6.0828,[51]6.0770,[52]6.0641,[53]6.1143,[54]6.0935,[55]6.0711,[56]6.1026,[57]6.1204,[58]6.1395,[59]6.1568,[60]6.2005,[61]6.1877,[62]6.2443,[63]6.2750,[64]6.2855,[65]6.3318,[66]6.3425,[67]6.3595,[68]6.3727,[69]6.3960,[70]6.4241,[71]6.4462,[72]6.4771,[73]6.5356,[74]6.5402,[75]6.5540,[76]6.5678,[77]6.5809,[78]6.5676,[79]6.5972,[80]6.5901,[81]6.6116,[82]6.6191,[83]6.5641,[84]6.5477,[85]6.5358,[86]6.5131,[87]6.4540,[88]6.4292,[89]6.4065,[90]6.3916,[91]6.4163,[92]6.4097,[93]6.4076,[94]6.4035,[95]6.4344,[96]6.4334,[97]6.4307,[98]6.4229,[99]6.4064,[100]6.4028,[101]6.4276,[102]6.4226,[103]6.4423,[104]6.4523,[105]6.4513,[106]6.4695,[107]6.4686,[108]6.4823,[109]6.4755,[110]6.4705,[111]6.4917,[112]6.5138,[113]6.5168,[114]6.5123,[115]6.5174,[116]6.5070,[117]6.5132,[118]6.5409,[119]6.5639,[120]6.6008,[121]6.6161,[122]6.6400,[123]6.6799,[124]6.6983,[125]6.6883,[126]6.7288,[127]6.7652,[128]6.7980,[129]6.7815,[130]6.7933,[131]6.7891,[132]6.7800,[133]6.7666,[134]6.7767,[135]6.7732,[136]6.7610,[137]6.7534,[138]6.7372,[139]6.7262,[140]6.7223,[141]6.6942,[142]6.6919,[143]6.6639,[144]6.6421,[145]6.6340,[146]6.6223,[147]6.6277,[148]6.6288,[149]6.6248,[150]6.6211,[151]6.6245,[152]6.6127,[153]6.5960,[154]6.5869,[155]6.5947,[156]6.5902,[157]6.6067,[158]6.6105,[159]6.6163,[160]6.6186,[161]6.6303,[162]6.6001,[163]6.5878,[164]6.5632,[165]6.5306,[166]6.5022,[167]6.4620,[168]6.4312,[169]6.4187,[170]6.4065,[171]6.3793,[172]6.3606,[173]6.3438,[174]6.3135,[175]6.2925,[176]6.2802,[177]6.2597,[178]6.2356,[179]6.2178,[180]6.2078,[181]6.1855,[182]6.1676,[183]6.1533,[184]6.1522,[185]6.1442,[186]6.1450,[187]6.1517,[188]6.1481,[189]6.1669,[190]6.1687,[191]6.1902,[192]6.2058,[193]6.2228,[194]6.2345,[195]6.2561,[196]6.2728,[197]6.2940,[198]6.3099,[199]6.3130,[200]6.3178,[201]6.3125,[202]6.3326,[203]6.3409,[204]6.3412,[205]6.3520,[206]6.3594,[207]6.3562,[208]6.3657,[209]6.3699,[210]6.3741,[211]6.3848,[212]6.3934,[213]6.4033,[214]6.4075,[215]6.4098,[216]6.4236,[217]6.4427,[218]6.4566,[219]6.4574,[220]6.4532,[221]6.4471,[222]6.4452,[223]6.4341,[224]6.4269,[225]6.4236,[226]6.4446,[227]6.4543,[228]6.4603,[229]6.4667,[230]6.4637,[231]6.4799,[232]6.4681,[233]6.4506,[234]6.4346,[235]6.4176,[236]6.4111,[237]6.4012,[238]6.4037,[239]6.3879,[240]6.3762,[241]6.3788,[242]6.3819,[243]6.3791,[244]6.3677,[245]6.3647,[246]6.3536,[247]6.3406,[248]6.3321,[249]6.3284,[250]6.3323,[251]6.3256,[252]6.3212,[253]6.3115,[254]6.3061,[255]6.2947,[256]6.2755,[257]6.2624,[258]6.2531,[259]6.2504,[260]6.2413,[261]6.2365,[262]6.2312,[263]6.2250,[264]6.2054,[265]6.2053,[266]6.2045,[267]6.1977,[268]6.2064,[269]6.2061,[270]6.2058,[271]6.2139,[272]6.2175,[273]6.2174,[274]6.2193,[275]6.2285,[276]6.2346,[277]6.2503,[278]6.2606,[279]6.2697,[280]6.2724,[281]6.2824,[282]6.2886,[283]6.3040,[284]6.3115,[285]6.3195,[286]6.3321,[287]6.3312,[288]6.3376,[289]6.3282,[290]6.3114,[291]6.2955,[292]6.2800,[293]6.2666,[294]6.2689,[295]6.2679,[296]6.2733,[297]6.2728,[298]6.2765,[299]6.2737,[300]6.2626,[301]6.2621,[302]6.2543,[303]6.2449,[304]6.2360,[305]6.2325,[306]6.2200,[307]6.2222,[308]6.2250,[309]6.2083,[310]6.2020,[311]6.1956,[312]6.1979,[313]6.1922,[314]6.1909,[315]6.1747,[316]6.1704,[317]6.1539,[318]6.1327,[319]6.1453,[320]6.1573,[321]6.1615,[322]6.1572,[323]6.1502,[324]6.1472,[325]6.1580,[326]6.1580,[327]6.1597,[328]6.1627,[329]6.1687,[330]6.1718,[331]6.1839,[332]6.1807,[333]6.1886,[334]6.1828,[335]6.1760,[336]6.1794,[337]6.1767,[338]6.1761,[339]6.1704,[340]6.1660,[341]6.1738,[342]6.1766,[343]6.1812,[344]6.1813,[345]6.1812,[346]6.1780,[347]6.1816,[348]6.1851,[349]6.1874,[350]6.1845,[351]6.1852,[352]6.1850,[353]6.1787,[354]6.1797,[355]6.1847,[356]6.1882,[357]6.1852,[358]6.1947,[359]6.1969,[360]6.1935,[361]6.1926,[362]6.1995,[363]6.2106,[364]6.2168,[365]6.2215,[366]6.2232,[367]6.2319,[368]6.2287,[369]6.2299,[370]6.2320,[371]6.2264,[372]6.2313,[373]6.2360,[374]6.2339,[375]6.2337,[376]6.2406,[377]6.2357,[378]6.2381,[379]6.2442,[380]6.2361,[381]6.2325,[382]6.2279,[383]6.2270,[384]6.2265,[385]6.2256,[386]6.2257,[387]6.2261,[388]6.2220,[389]6.2165,[390]6.2098,[391]6.2021,[392]6.1978,[393]6.1964,[394]6.1990,[395]6.1975,[396]6.1899,[397]6.1972,[398]6.2014,[399]6.2090,[400]6.2084,[401]6.2095,[402]6.2106,[403]6.2125,[404]6.2192,[405]6.2105,[406]6.2078,[407]6.2075,[408]6.2095,[409]6.2215,[410]6.2328,[411]6.2447,[412]6.2608,[413]6.2719,[414]6.2801,[415]6.2857,[416]6.2938,[417]6.3065,[418]6.3102,[419]6.3178,[420]6.3273,[421]6.3392,[422]6.3433,[423]6.3502,[424]6.3610,[425]6.3702,[426]6.3770,[427]6.3816,[428]6.3902,[429]6.3958,[430]6.4038,[431]6.4182,[432]6.4224,[433]6.4215,[434]6.4168,[435]6.4179,[436]6.4202,[437]6.4302,[438]6.4379,[439]6.4347,[440]6.4333,[441]6.4283,[442]6.4264,[443]6.4274,[444]6.4275,[445]6.4256,[446]6.4284,[447]6.4313,[448]6.4354,[449]6.4329,[450]6.4334,[451]6.4293,[452]6.4177,[453]6.4093,[454]6.4038,[455]6.4045,[456]6.4097,[457]6.4117,[458]6.4094,[459]6.4102,[460]6.4191,[461]6.4162,[462]6.4148,[463]6.4192,[464]6.4179,[465]6.4154,[466]6.4077,[467]6.4087,[468]6.4088,[469]6.4112,[470]6.4122,[471]6.4076,[472]6.4127,[473]6.4073,[474]6.4089,[475]6.4028,[476]6.4050,[477]6.3983,[478]6.3978,[479]6.4043,[480]6.4089,[481]6.4104,[482]6.4060,[483]6.4019,[484]6.4038,[485]6.4021,[486]6.3958,[487]6.3956,[488]6.3936,[489]6.3886,[490]6.3865,[491]6.3837,[492]6.3778,[493]6.3749,[494]6.3729,[495]6.3729,[496]6.3692,[497]6.3638,[498]6.3621,[499]6.3572,[500]6.3475,[501]6.3412,[502]6.3412,[503]6.3406,[504]6.3315,[505]6.3335,[506]6.3345,[507]6.3290,[508]6.3250,[509]6.3246,[510]6.3283,[511]6.3331,[512]6.3366,[513]6.3385,[514]6.3454,[515]6.3399,[516]6.3392,[517]6.3399,[518]6.3393,[519]6.3428,[520]6.3452,[521]6.3470,[522]6.3498,[523]6.3507,[524]6.3565,[525]6.3600,[526]6.3613,[527]6.3629,[528]6.3577,[529]6.3587,[530]6.3531,[531]6.3513,[532]6.3563,[533]6.3585,[534]6.3569,[535]6.3592,[536]6.3539,[537]6.3515,[538]6.3569,[539]6.3578,[540]6.3614,[541]6.3618,[542]6.3621,[543]6.3638,[544]6.3647,[545]6.3628,[546]6.3636,[547]6.3594,[548]6.3543,[549]6.3539,[550]6.3512,[551]6.3473,[552]6.3448,[553]6.3410,[554]6.3387,[555]6.3355,[556]6.3353,[557]6.3380,[558]6.3343,[559]6.3341,[560]6.3339,[561]6.3345,[562]6.3320,[563]6.3317,[564]6.3366,[565]6.3388,[566]6.3388,[567]6.3369,[568]6.3372,[569]6.3357,[570]6.3383,[571]6.3387,[572]6.3392,[573]6.3384,[574]6.3348,[575]6.3345,[576]6.3344,[577]6.3327,[578]6.3302,[579]6.3306,[580]6.3240,[581]6.3203,[582]6.3194,[583]6.3202,[584]6.3203,[585]6.3124,[586]6.3054,[587]6.3062,[588]6.3110,[589]6.3166,[590]6.3199,[591]6.3219,[592]6.3208,[593]6.3171,[594]6.3180,[595]6.3155,[596]6.3190,[597]6.3167,[598]6.3142,[599]6.3164,[600]6.3163,[601]6.3150,[602]6.3170,[603]6.3196,[604]6.3205,[605]6.3244,[606]6.3266,[607]6.3253,[608]6.3216,[609]6.3219,[610]6.3256,[611]6.3240,[612]6.3267,[613]6.3229,[614]6.3181,[615]6.3103,[616]6.3132,[617]6.3069,[618]6.3019,[619]6.2962,[620]6.2818,[621]6.2747,[622]6.2730,[623]6.2747,[624]6.2752,[625]6.2751,[626]6.2743,[627]6.2769,[628]6.2770,[629]6.2766,[630]6.2796,[631]6.2854,[632]6.2914,[633]6.2898,[634]6.2935,[635]6.2940,[636]6.2904,[637]6.2870,[638]6.2900,[639]6.2867,[640]6.2879,[641]6.2880,[642]6.2946,[643]6.2966,[644]6.2980,[645]6.2963,[646]6.3009,[647]6.2971,[648]6.2984,[649]6.2985,[650]6.3024,[651]6.3077,[652]6.3087,[653]6.3129,[654]6.3063,[655]6.3056,

ggerganov

Sorry about the conflicts

Please resolve and merge

slaren · 2023-03-28T17:43:18Z

Rebased to master.

slaren · 2023-03-28T17:54:45Z

The bot almost got it right, the purpose of using the reference implementation in ggml_quantize_q4_1 is to ensure the accuracy when quantizing the model.

ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

slaren · 2023-03-28T18:08:35Z

Ironically, after the changes to master I am seeing slightly lower perplexity with the AVX path in the first chunks.

master:
[1]4.5870,[2]5.0477,[3]5.9136,[4]6.5310,[5]6.6497,

avx2:
[1]4.5671,[2]5.0153,[3]5.8921,[4]6.4689,[5]6.5678,

🤷‍♂️

ggerganov · 2023-03-28T18:14:12Z

I guess we must be doing something right 🦙

…sions (ggml-org#515) Update poetry.lock accordingly.

slaren force-pushed the avx2-quantize-q4_1 branch from 7dca16b to ae08d8e Compare March 26, 2023 18:53

anzz1 added enhancement New feature or request performance Speed related topics hardware Hardware related generation quality Quality of model output labels Mar 27, 2023

slaren requested a review from ggerganov March 27, 2023 10:15

ggerganov requested changes Mar 28, 2023

View reviewed changes

Add AVX2 implementation of quantize_row_q4_1

e296529

slaren force-pushed the avx2-quantize-q4_1 branch from ae08d8e to e296529 Compare March 28, 2023 17:41

Actually use AVX2

41669f6

slaren force-pushed the avx2-quantize-q4_1 branch from 3125ea0 to 41669f6 Compare March 28, 2023 17:49

ggerganov reviewed Mar 28, 2023

View reviewed changes

ggml.c Outdated Show resolved Hide resolved

ggerganov approved these changes Mar 28, 2023

View reviewed changes

Make quantize_row_q4_1 static

6ab328d

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggerganov merged commit 2a98bc1 into ggml-org:master Mar 28, 2023

slaren deleted the avx2-quantize-q4_1 branch March 28, 2023 18:44

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023

pyproject.toml: extras list should contain only package list, not ver…

c03fa87

…sions (ggml-org#515) Update poetry.lock accordingly.

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AVX2 implementation of quantize_row_q4_1 #515

Add AVX2 implementation of quantize_row_q4_1 #515

slaren commented Mar 26, 2023 •

edited by ghost

Loading

slaren commented Mar 26, 2023 •

edited

Loading

Green-Sky commented Mar 26, 2023 •

edited

Loading

slaren commented Mar 26, 2023 •

edited

Loading

Green-Sky commented Mar 26, 2023

slaren commented Mar 26, 2023 •

edited

Loading

anzz1 commented Mar 26, 2023 •

edited

Loading

Green-Sky commented Mar 26, 2023

slaren commented Mar 27, 2023

ggerganov left a comment

slaren commented Mar 28, 2023

slaren commented Mar 28, 2023

slaren commented Mar 28, 2023

ggerganov commented Mar 28, 2023

Add AVX2 implementation of quantize_row_q4_1 #515

Add AVX2 implementation of quantize_row_q4_1 #515

Conversation

slaren commented Mar 26, 2023 • edited by ghost Loading

🤖 Generated by Copilot at ae08d8e

Summary

Walkthrough

slaren commented Mar 26, 2023 • edited Loading

Green-Sky commented Mar 26, 2023 • edited Loading

slaren commented Mar 26, 2023 • edited Loading

Green-Sky commented Mar 26, 2023

slaren commented Mar 26, 2023 • edited Loading

anzz1 commented Mar 26, 2023 • edited Loading

Green-Sky commented Mar 26, 2023

slaren commented Mar 27, 2023

ggerganov left a comment

Choose a reason for hiding this comment

slaren commented Mar 28, 2023

slaren commented Mar 28, 2023

slaren commented Mar 28, 2023

ggerganov commented Mar 28, 2023

slaren commented Mar 26, 2023 •

edited by ghost

Loading

`🤖 Generated by Copilot at ae08d8e`

slaren commented Mar 26, 2023 •

edited

Loading

Green-Sky commented Mar 26, 2023 •

edited

Loading

slaren commented Mar 26, 2023 •

edited

Loading

slaren commented Mar 26, 2023 •

edited

Loading

anzz1 commented Mar 26, 2023 •

edited

Loading