Skip to content

Add AVX2 implementation of quantize_row_q4_1 #515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 28, 2023

Conversation

slaren
Copy link
Member

@slaren slaren commented Mar 26, 2023

Largely based on the AVX2 implementation of quantize_row_q4_0.

Run on (16 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.17, 1.04, 1.50
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
BM_quantize_row_q4_1_ref      12845 ns        12845 ns        54677
BM_quantize_row_q4_1_avx       1360 ns         1360 ns       519134

🤖 Generated by Copilot at ae08d8e

Summary

🚀🐛♻️

Improved matrix quantization with AVX2 and bug fixes. Added a new function quantize_row_q4_1 that uses AVX2 instructions to speed up the quantization of a matrix row using 4-bit factors. Renamed and fixed the original function quantize_row_q4_1_reference. Updated ggml_quantize_q4_1 to use the appropriate function depending on the CPU capabilities.

We're sailing on the matrix sea, with quantize_row_q4_1
We've fixed a bug and gained some speed, with quantize_row_q4_1
So heave away, me hearties, heave away with glee
We'll raise the sail and catch the wind, with quantize_row_q4_1

Walkthrough

  • Rename quantize_row_q4_1 to quantize_row_q4_1_reference to avoid confusion with the new AVX2-optimized function (link)
  • Add quantize_row_q4_1 that uses AVX2 instructions to speed up the quantization algorithm for 4-bit factors (link)
  • Replace quantize_row_q4_1 with quantize_row_q4_1_reference in ggml_quantize_q4_1 to fix a bug and avoid unnecessary computation (link)

@slaren
Copy link
Member Author

slaren commented Mar 26, 2023

Perplexity after this change: 6.3029 (7B q4_1)

Full run output

./perplexity -m ./models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12
main: seed = 1679789188
llama_model_load: loading model from './models/7B/ggml-model-q4_1.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 3
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml ctx size = 5076.59 MB
llama_model_load: mem required = 6868.59 MB (+ 1026.00 MB per state)
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_1.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4820.52 MB / num tensors = 291
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
59.76 seconds per pass - ETA 10.87 hours
[1]4.6106,[2]5.0788,[3]5.9975,[4]6.5975,[5]6.6904,[6]6.6597,[7]6.8588,[8]6.9568,[9]7.2826,[10]7.5468,[11]7.7747,[12]7.8253,[13]7.7676,[14]7.8484,[15]8.1075,[16]7.6907,[17]7.5501,[18]7.4949,[19]7.1007,[20]7.0787,[21]6.9834,[22]6.8058,[23]6.7749,[24]6.6862,[25]6.6797,[26]6.5124,[27]6.3226,[28]6.2171,[29]6.1203,[30]5.9528,[31]5.9206,[32]5.9367,[33]5.8739,[34]5.9077,[35]5.9296,[36]5.9729,[37]5.9745,[38]5.9824,[39]6.0189,[40]6.0655,[41]6.0861,[42]6.1289,[43]6.0895,[44]6.1479,[45]6.1487,[46]6.1206,[47]6.1384,[48]6.1157,[49]6.1142,[50]6.0721,[51]6.0641,[52]6.0524,[53]6.1027,[54]6.0832,[55]6.0599,[56]6.0930,[57]6.1117,[58]6.1306,[59]6.1457,[60]6.1885,[61]6.1751,[62]6.2314,[63]6.2605,[64]6.2724,[65]6.3186,[66]6.3298,[67]6.3479,[68]6.3603,[69]6.3840,[70]6.4122,[71]6.4332,[72]6.4649,[73]6.5236,[74]6.5269,[75]6.5418,[76]6.5553,[77]6.5675,[78]6.5542,[79]6.5846,[80]6.5776,[81]6.6003,[82]6.6076,[83]6.5525,[84]6.5364,[85]6.5249,[86]6.5017,[87]6.4426,[88]6.4180,[89]6.3960,[90]6.3812,[91]6.4069,[92]6.4003,[93]6.3985,[94]6.3942,[95]6.4252,[96]6.4245,[97]6.4218,[98]6.4135,[99]6.3976,[100]6.3935,[101]6.4180,[102]6.4125,[103]6.4327,[104]6.4431,[105]6.4418,[106]6.4598,[107]6.4601,[108]6.4743,[109]6.4664,[110]6.4615,[111]6.4824,[112]6.5048,[113]6.5083,[114]6.5039,[115]6.5092,[116]6.4988,[117]6.5055,[118]6.5340,[119]6.5562,[120]6.5922,[121]6.6074,[122]6.6315,[123]6.6712,[124]6.6899,[125]6.6808,[126]6.7218,[127]6.7585,[128]6.7905,[129]6.7743,[130]6.7853,[131]6.7815,[132]6.7721,[133]6.7581,[134]6.7678,[135]6.7641,[136]6.7522,[137]6.7453,[138]6.7293,[139]6.7193,[140]6.7153,[141]6.6870,[142]6.6843,[143]6.6561,[144]6.6345,[145]6.6274,[146]6.6161,[147]6.6203,[148]6.6217,[149]6.6186,[150]6.6148,[151]6.6188,[152]6.6069,[153]6.5908,[154]6.5822,[155]6.5901,[156]6.5849,[157]6.6010,[158]6.6043,[159]6.6100,[160]6.6120,[161]6.6245,[162]6.5950,[163]6.5826,[164]6.5577,[165]6.5254,[166]6.4974,[167]6.4577,[168]6.4266,[169]6.4141,[170]6.4023,[171]6.3752,[172]6.3569,[173]6.3402,[174]6.3098,[175]6.2891,[176]6.2767,[177]6.2560,[178]6.2328,[179]6.2153,[180]6.2052,[181]6.1829,[182]6.1645,[183]6.1500,[184]6.1489,[185]6.1412,[186]6.1421,[187]6.1483,[188]6.1446,[189]6.1635,[190]6.1652,[191]6.1868,[192]6.2028,[193]6.2202,[194]6.2317,[195]6.2534,[196]6.2696,[197]6.2906,[198]6.3066,[199]6.3100,[200]6.3153,[201]6.3098,[202]6.3299,[203]6.3383,[204]6.3390,[205]6.3506,[206]6.3581,[207]6.3548,[208]6.3644,[209]6.3688,[210]6.3730,[211]6.3837,[212]6.3928,[213]6.4027,[214]6.4066,[215]6.4093,[216]6.4233,[217]6.4425,[218]6.4568,[219]6.4577,[220]6.4536,[221]6.4471,[222]6.4452,[223]6.4342,[224]6.4271,[225]6.4234,[226]6.4442,[227]6.4537,[228]6.4598,[229]6.4659,[230]6.4627,[231]6.4787,[232]6.4669,[233]6.4494,[234]6.4334,[235]6.4167,[236]6.4103,[237]6.4003,[238]6.4026,[239]6.3865,[240]6.3748,[241]6.3770,[242]6.3800,[243]6.3775,[244]6.3658,[245]6.3624,[246]6.3510,[247]6.3379,[248]6.3295,[249]6.3259,[250]6.3301,[251]6.3234,[252]6.3190,[253]6.3097,[254]6.3042,[255]6.2931,[256]6.2743,[257]6.2611,[258]6.2515,[259]6.2487,[260]6.2395,[261]6.2342,[262]6.2286,[263]6.2225,[264]6.2029,[265]6.2027,[266]6.2019,[267]6.1946,[268]6.2032,[269]6.2024,[270]6.2017,[271]6.2097,[272]6.2133,[273]6.2134,[274]6.2153,[275]6.2246,[276]6.2307,[277]6.2459,[278]6.2562,[279]6.2648,[280]6.2675,[281]6.2778,[282]6.2834,[283]6.2984,[284]6.3061,[285]6.3141,[286]6.3268,[287]6.3259,[288]6.3326,[289]6.3232,[290]6.3067,[291]6.2908,[292]6.2755,[293]6.2622,[294]6.2644,[295]6.2637,[296]6.2690,[297]6.2683,[298]6.2719,[299]6.2692,[300]6.2579,[301]6.2574,[302]6.2499,[303]6.2405,[304]6.2314,[305]6.2283,[306]6.2155,[307]6.2175,[308]6.2203,[309]6.2037,[310]6.1974,[311]6.1913,[312]6.1939,[313]6.1880,[314]6.1866,[315]6.1705,[316]6.1662,[317]6.1495,[318]6.1284,[319]6.1410,[320]6.1533,[321]6.1573,[322]6.1529,[323]6.1458,[324]6.1428,[325]6.1538,[326]6.1538,[327]6.1556,[328]6.1588,[329]6.1647,[330]6.1678,[331]6.1801,[332]6.1770,[333]6.1847,[334]6.1791,[335]6.1724,[336]6.1758,[337]6.1731,[338]6.1724,[339]6.1668,[340]6.1625,[341]6.1702,[342]6.1732,[343]6.1779,[344]6.1778,[345]6.1777,[346]6.1743,[347]6.1782,[348]6.1819,[349]6.1841,[350]6.1811,[351]6.1819,[352]6.1818,[353]6.1753,[354]6.1764,[355]6.1816,[356]6.1851,[357]6.1819,[358]6.1914,[359]6.1938,[360]6.1904,[361]6.1899,[362]6.1967,[363]6.2078,[364]6.2142,[365]6.2191,[366]6.2206,[367]6.2292,[368]6.2263,[369]6.2275,[370]6.2296,[371]6.2241,[372]6.2291,[373]6.2340,[374]6.2320,[375]6.2318,[376]6.2388,[377]6.2338,[378]6.2364,[379]6.2424,[380]6.2345,[381]6.2309,[382]6.2264,[383]6.2257,[384]6.2252,[385]6.2242,[386]6.2241,[387]6.2246,[388]6.2207,[389]6.2153,[390]6.2088,[391]6.2012,[392]6.1970,[393]6.1955,[394]6.1984,[395]6.1970,[396]6.1894,[397]6.1965,[398]6.2007,[399]6.2081,[400]6.2074,[401]6.2086,[402]6.2097,[403]6.2119,[404]6.2185,[405]6.2100,[406]6.2072,[407]6.2068,[408]6.2089,[409]6.2210,[410]6.2320,[411]6.2440,[412]6.2604,[413]6.2715,[414]6.2798,[415]6.2853,[416]6.2934,[417]6.3062,[418]6.3097,[419]6.3173,[420]6.3268,[421]6.3386,[422]6.3426,[423]6.3496,[424]6.3605,[425]6.3698,[426]6.3766,[427]6.3812,[428]6.3896,[429]6.3951,[430]6.4031,[431]6.4174,[432]6.4216,[433]6.4206,[434]6.4158,[435]6.4169,[436]6.4193,[437]6.4295,[438]6.4376,[439]6.4342,[440]6.4329,[441]6.4279,[442]6.4259,[443]6.4269,[444]6.4272,[445]6.4251,[446]6.4276,[447]6.4308,[448]6.4350,[449]6.4326,[450]6.4330,[451]6.4288,[452]6.4174,[453]6.4091,[454]6.4034,[455]6.4041,[456]6.4093,[457]6.4114,[458]6.4091,[459]6.4099,[460]6.4188,[461]6.4158,[462]6.4146,[463]6.4190,[464]6.4176,[465]6.4150,[466]6.4073,[467]6.4084,[468]6.4087,[469]6.4110,[470]6.4122,[471]6.4075,[472]6.4128,[473]6.4073,[474]6.4090,[475]6.4030,[476]6.4050,[477]6.3981,[478]6.3975,[479]6.4043,[480]6.4091,[481]6.4106,[482]6.4061,[483]6.4018,[484]6.4039,[485]6.4023,[486]6.3960,[487]6.3958,[488]6.3938,[489]6.3888,[490]6.3865,[491]6.3837,[492]6.3779,[493]6.3749,[494]6.3730,[495]6.3730,[496]6.3694,[497]6.3640,[498]6.3623,[499]6.3575,[500]6.3478,[501]6.3413,[502]6.3414,[503]6.3408,[504]6.3317,[505]6.3338,[506]6.3346,[507]6.3289,[508]6.3250,[509]6.3244,[510]6.3283,[511]6.3331,[512]6.3367,[513]6.3385,[514]6.3456,[515]6.3401,[516]6.3393,[517]6.3400,[518]6.3394,[519]6.3428,[520]6.3452,[521]6.3468,[522]6.3495,[523]6.3503,[524]6.3561,[525]6.3596,[526]6.3609,[527]6.3625,[528]6.3572,[529]6.3580,[530]6.3525,[531]6.3508,[532]6.3558,[533]6.3583,[534]6.3568,[535]6.3593,[536]6.3540,[537]6.3515,[538]6.3567,[539]6.3577,[540]6.3613,[541]6.3616,[542]6.3618,[543]6.3638,[544]6.3648,[545]6.3628,[546]6.3637,[547]6.3594,[548]6.3542,[549]6.3538,[550]6.3508,[551]6.3470,[552]6.3444,[553]6.3407,[554]6.3382,[555]6.3349,[556]6.3347,[557]6.3374,[558]6.3336,[559]6.3333,[560]6.3330,[561]6.3337,[562]6.3311,[563]6.3309,[564]6.3358,[565]6.3379,[566]6.3379,[567]6.3359,[568]6.3363,[569]6.3346,[570]6.3374,[571]6.3379,[572]6.3383,[573]6.3377,[574]6.3339,[575]6.3335,[576]6.3333,[577]6.3318,[578]6.3293,[579]6.3297,[580]6.3232,[581]6.3195,[582]6.3186,[583]6.3193,[584]6.3193,[585]6.3115,[586]6.3044,[587]6.3051,[588]6.3098,[589]6.3154,[590]6.3185,[591]6.3205,[592]6.3193,[593]6.3155,[594]6.3164,[595]6.3139,[596]6.3174,[597]6.3151,[598]6.3125,[599]6.3149,[600]6.3147,[601]6.3135,[602]6.3156,[603]6.3181,[604]6.3189,[605]6.3228,[606]6.3249,[607]6.3236,[608]6.3199,[609]6.3203,[610]6.3241,[611]6.3224,[612]6.3250,[613]6.3212,[614]6.3164,[615]6.3086,[616]6.3114,[617]6.3050,[618]6.2999,[619]6.2940,[620]6.2796,[621]6.2725,[622]6.2708,[623]6.2723,[624]6.2730,[625]6.2730,[626]6.2722,[627]6.2746,[628]6.2748,[629]6.2742,[630]6.2773,[631]6.2831,[632]6.2890,[633]6.2872,[634]6.2909,[635]6.2914,[636]6.2878,[637]6.2844,[638]6.2874,[639]6.2841,[640]6.2853,[641]6.2855,[642]6.2922,[643]6.2940,[644]6.2952,[645]6.2937,[646]6.2982,[647]6.2944,[648]6.2957,[649]6.2957,[650]6.2997,[651]6.3050,[652]6.3060,[653]6.3104,[654]6.3037,[655]6.3029,

Please disregard this result, I was using a broken model. I am re-running the perplexity computation now.

@Green-Sky
Copy link
Collaborator

Green-Sky commented Mar 26, 2023

running on latest master, it starts out like this for me:

65.24 seconds per pass - ETA 11.87 hours
[1]4.4948,[2]4.9721,[3]5.8697,[4]6.4772,[5]6.6286,

your branch on my machine:

46.66 seconds per pass - ETA 8.49 hours
[1]4.5903,[2]5.0429,[3]5.9618,[4]6.5779,[5]6.6896,

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

$ make
I llama.cpp build info:
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mavx -mavx2 -mfma -mf16c -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:
I CC:       cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
I CXX:      g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

note: i tried to match your settings ./perplexity -m ./models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12

@slaren
Copy link
Member Author

slaren commented Mar 26, 2023

@Green-Sky does your system-info have the same flags as mine? I wonder if there is a different path somewhere that may cause the difference. I get the same results even after rebasing to current master.

On master, my result is also different than yours:

62.33 seconds per pass - ETA 11.34 hours
[1]4.5381,[2]5.0059,[3]5.9007,

Just in case my model is broken somehow, this is the SHA256 hash:

0733914c21bc6beb432d0845f9c0abc6d12325447e64e20134c5fca72e039b79  models/7B/ggml-model-q4_1.bin

Can you verify if yours is the same?

@Green-Sky
Copy link
Collaborator

oh wow, it's different

21a45d7b56e495d3d1ec2615b779241b1285a6f8d17ba6e5d5c3db00c7d2ca2f  models/7B/ggml-model-q4_1.bin

I regenerated to double check, and same hash again.

i also checked the src

700df0d3013b703a806d2ae7f1bfb8e59814e3d06ae78be0c66368a50059f33d  models/7B/consolidated.00.pth

which matches the SHA256SUMS file

@slaren
Copy link
Member Author

slaren commented Mar 26, 2023

@Green-Sky It looks like the problem was my model, after re-converting and re-quantizing the model I get the same sum and perplexity as yours. I will re-run the perplexity computation in case there is a significant difference. Thanks for checking!

@anzz1
Copy link
Contributor

anzz1 commented Mar 26, 2023

If I understood the results correctly, @Green-Sky shows major increase in speed with a slight decrease in accuracy?
In addition to comparing cpuid flags, shouldn't you need to compare your gcc versions too since the resulting binary code can vary depending on that? I'd think that the variations caused by compiler optimizations has a much greater effect in determinism than the processor's branch predictions and whatnots?

A sidepoint related to this:

edit: -snip- as it doesn't really belong here, I made it a discussion topic:

@Green-Sky
Copy link
Collaborator

updated my previous post with system_info and make command.

shows major increase in speed with a slight decrease in accuracy?

yes, however the perplexity is very unstable in the beginning. so a full run would be necessary.

@slaren slaren force-pushed the avx2-quantize-q4_1 branch from 7dca16b to ae08d8e Compare March 26, 2023 18:53
@anzz1 anzz1 added enhancement New feature or request performance Speed related topics hardware Hardware related generation quality Quality of model output labels Mar 27, 2023
@slaren
Copy link
Member Author

slaren commented Mar 27, 2023

Perplexity: 6.3056 (7B q4_1)

Full run output
make && ./perplexity -m ./models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12
I llama.cpp build info:
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mavx -mavx2 -mfma -mf16c -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

make: Nothing to be done for 'default'.
main: seed = 1679851786
llama_model_load: loading model from './models/7B/ggml-model-q4_1.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 3
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml ctx size = 5076.59 MB
llama_model_load: mem required  = 6868.59 MB (+ 1026.00 MB per state)
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_1.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4820.52 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
60.09 seconds per pass - ETA 10.93 hours
[1]4.5903,[2]5.0429,[3]5.9618,[4]6.5779,[5]6.6896,[6]6.6522,[7]6.8727,[8]6.9693,[9]7.2955,[10]7.5687,[11]7.7932,[12]7.8358,[13]7.7806,[14]7.8659,[15]8.1173,[16]7.7044,[17]7.5588,[18]7.5070,[19]7.1124,[20]7.0921,[21]6.9944,[22]6.8186,[23]6.7866,[24]6.6973,[25]6.6939,[26]6.5230,[27]6.3321,[28]6.2255,[29]6.1306,[30]5.9621,[31]5.9297,[32]5.9448,[33]5.8841,[34]5.9205,[35]5.9420,[36]5.9854,[37]5.9870,[38]5.9925,[39]6.0279,[40]6.0754,[41]6.0956,[42]6.1372,[43]6.0987,[44]6.1591,[45]6.1605,[46]6.1327,[47]6.1492,[48]6.1271,[49]6.1264,[50]6.0828,[51]6.0770,[52]6.0641,[53]6.1143,[54]6.0935,[55]6.0711,[56]6.1026,[57]6.1204,[58]6.1395,[59]6.1568,[60]6.2005,[61]6.1877,[62]6.2443,[63]6.2750,[64]6.2855,[65]6.3318,[66]6.3425,[67]6.3595,[68]6.3727,[69]6.3960,[70]6.4241,[71]6.4462,[72]6.4771,[73]6.5356,[74]6.5402,[75]6.5540,[76]6.5678,[77]6.5809,[78]6.5676,[79]6.5972,[80]6.5901,[81]6.6116,[82]6.6191,[83]6.5641,[84]6.5477,[85]6.5358,[86]6.5131,[87]6.4540,[88]6.4292,[89]6.4065,[90]6.3916,[91]6.4163,[92]6.4097,[93]6.4076,[94]6.4035,[95]6.4344,[96]6.4334,[97]6.4307,[98]6.4229,[99]6.4064,[100]6.4028,[101]6.4276,[102]6.4226,[103]6.4423,[104]6.4523,[105]6.4513,[106]6.4695,[107]6.4686,[108]6.4823,[109]6.4755,[110]6.4705,[111]6.4917,[112]6.5138,[113]6.5168,[114]6.5123,[115]6.5174,[116]6.5070,[117]6.5132,[118]6.5409,[119]6.5639,[120]6.6008,[121]6.6161,[122]6.6400,[123]6.6799,[124]6.6983,[125]6.6883,[126]6.7288,[127]6.7652,[128]6.7980,[129]6.7815,[130]6.7933,[131]6.7891,[132]6.7800,[133]6.7666,[134]6.7767,[135]6.7732,[136]6.7610,[137]6.7534,[138]6.7372,[139]6.7262,[140]6.7223,[141]6.6942,[142]6.6919,[143]6.6639,[144]6.6421,[145]6.6340,[146]6.6223,[147]6.6277,[148]6.6288,[149]6.6248,[150]6.6211,[151]6.6245,[152]6.6127,[153]6.5960,[154]6.5869,[155]6.5947,[156]6.5902,[157]6.6067,[158]6.6105,[159]6.6163,[160]6.6186,[161]6.6303,[162]6.6001,[163]6.5878,[164]6.5632,[165]6.5306,[166]6.5022,[167]6.4620,[168]6.4312,[169]6.4187,[170]6.4065,[171]6.3793,[172]6.3606,[173]6.3438,[174]6.3135,[175]6.2925,[176]6.2802,[177]6.2597,[178]6.2356,[179]6.2178,[180]6.2078,[181]6.1855,[182]6.1676,[183]6.1533,[184]6.1522,[185]6.1442,[186]6.1450,[187]6.1517,[188]6.1481,[189]6.1669,[190]6.1687,[191]6.1902,[192]6.2058,[193]6.2228,[194]6.2345,[195]6.2561,[196]6.2728,[197]6.2940,[198]6.3099,[199]6.3130,[200]6.3178,[201]6.3125,[202]6.3326,[203]6.3409,[204]6.3412,[205]6.3520,[206]6.3594,[207]6.3562,[208]6.3657,[209]6.3699,[210]6.3741,[211]6.3848,[212]6.3934,[213]6.4033,[214]6.4075,[215]6.4098,[216]6.4236,[217]6.4427,[218]6.4566,[219]6.4574,[220]6.4532,[221]6.4471,[222]6.4452,[223]6.4341,[224]6.4269,[225]6.4236,[226]6.4446,[227]6.4543,[228]6.4603,[229]6.4667,[230]6.4637,[231]6.4799,[232]6.4681,[233]6.4506,[234]6.4346,[235]6.4176,[236]6.4111,[237]6.4012,[238]6.4037,[239]6.3879,[240]6.3762,[241]6.3788,[242]6.3819,[243]6.3791,[244]6.3677,[245]6.3647,[246]6.3536,[247]6.3406,[248]6.3321,[249]6.3284,[250]6.3323,[251]6.3256,[252]6.3212,[253]6.3115,[254]6.3061,[255]6.2947,[256]6.2755,[257]6.2624,[258]6.2531,[259]6.2504,[260]6.2413,[261]6.2365,[262]6.2312,[263]6.2250,[264]6.2054,[265]6.2053,[266]6.2045,[267]6.1977,[268]6.2064,[269]6.2061,[270]6.2058,[271]6.2139,[272]6.2175,[273]6.2174,[274]6.2193,[275]6.2285,[276]6.2346,[277]6.2503,[278]6.2606,[279]6.2697,[280]6.2724,[281]6.2824,[282]6.2886,[283]6.3040,[284]6.3115,[285]6.3195,[286]6.3321,[287]6.3312,[288]6.3376,[289]6.3282,[290]6.3114,[291]6.2955,[292]6.2800,[293]6.2666,[294]6.2689,[295]6.2679,[296]6.2733,[297]6.2728,[298]6.2765,[299]6.2737,[300]6.2626,[301]6.2621,[302]6.2543,[303]6.2449,[304]6.2360,[305]6.2325,[306]6.2200,[307]6.2222,[308]6.2250,[309]6.2083,[310]6.2020,[311]6.1956,[312]6.1979,[313]6.1922,[314]6.1909,[315]6.1747,[316]6.1704,[317]6.1539,[318]6.1327,[319]6.1453,[320]6.1573,[321]6.1615,[322]6.1572,[323]6.1502,[324]6.1472,[325]6.1580,[326]6.1580,[327]6.1597,[328]6.1627,[329]6.1687,[330]6.1718,[331]6.1839,[332]6.1807,[333]6.1886,[334]6.1828,[335]6.1760,[336]6.1794,[337]6.1767,[338]6.1761,[339]6.1704,[340]6.1660,[341]6.1738,[342]6.1766,[343]6.1812,[344]6.1813,[345]6.1812,[346]6.1780,[347]6.1816,[348]6.1851,[349]6.1874,[350]6.1845,[351]6.1852,[352]6.1850,[353]6.1787,[354]6.1797,[355]6.1847,[356]6.1882,[357]6.1852,[358]6.1947,[359]6.1969,[360]6.1935,[361]6.1926,[362]6.1995,[363]6.2106,[364]6.2168,[365]6.2215,[366]6.2232,[367]6.2319,[368]6.2287,[369]6.2299,[370]6.2320,[371]6.2264,[372]6.2313,[373]6.2360,[374]6.2339,[375]6.2337,[376]6.2406,[377]6.2357,[378]6.2381,[379]6.2442,[380]6.2361,[381]6.2325,[382]6.2279,[383]6.2270,[384]6.2265,[385]6.2256,[386]6.2257,[387]6.2261,[388]6.2220,[389]6.2165,[390]6.2098,[391]6.2021,[392]6.1978,[393]6.1964,[394]6.1990,[395]6.1975,[396]6.1899,[397]6.1972,[398]6.2014,[399]6.2090,[400]6.2084,[401]6.2095,[402]6.2106,[403]6.2125,[404]6.2192,[405]6.2105,[406]6.2078,[407]6.2075,[408]6.2095,[409]6.2215,[410]6.2328,[411]6.2447,[412]6.2608,[413]6.2719,[414]6.2801,[415]6.2857,[416]6.2938,[417]6.3065,[418]6.3102,[419]6.3178,[420]6.3273,[421]6.3392,[422]6.3433,[423]6.3502,[424]6.3610,[425]6.3702,[426]6.3770,[427]6.3816,[428]6.3902,[429]6.3958,[430]6.4038,[431]6.4182,[432]6.4224,[433]6.4215,[434]6.4168,[435]6.4179,[436]6.4202,[437]6.4302,[438]6.4379,[439]6.4347,[440]6.4333,[441]6.4283,[442]6.4264,[443]6.4274,[444]6.4275,[445]6.4256,[446]6.4284,[447]6.4313,[448]6.4354,[449]6.4329,[450]6.4334,[451]6.4293,[452]6.4177,[453]6.4093,[454]6.4038,[455]6.4045,[456]6.4097,[457]6.4117,[458]6.4094,[459]6.4102,[460]6.4191,[461]6.4162,[462]6.4148,[463]6.4192,[464]6.4179,[465]6.4154,[466]6.4077,[467]6.4087,[468]6.4088,[469]6.4112,[470]6.4122,[471]6.4076,[472]6.4127,[473]6.4073,[474]6.4089,[475]6.4028,[476]6.4050,[477]6.3983,[478]6.3978,[479]6.4043,[480]6.4089,[481]6.4104,[482]6.4060,[483]6.4019,[484]6.4038,[485]6.4021,[486]6.3958,[487]6.3956,[488]6.3936,[489]6.3886,[490]6.3865,[491]6.3837,[492]6.3778,[493]6.3749,[494]6.3729,[495]6.3729,[496]6.3692,[497]6.3638,[498]6.3621,[499]6.3572,[500]6.3475,[501]6.3412,[502]6.3412,[503]6.3406,[504]6.3315,[505]6.3335,[506]6.3345,[507]6.3290,[508]6.3250,[509]6.3246,[510]6.3283,[511]6.3331,[512]6.3366,[513]6.3385,[514]6.3454,[515]6.3399,[516]6.3392,[517]6.3399,[518]6.3393,[519]6.3428,[520]6.3452,[521]6.3470,[522]6.3498,[523]6.3507,[524]6.3565,[525]6.3600,[526]6.3613,[527]6.3629,[528]6.3577,[529]6.3587,[530]6.3531,[531]6.3513,[532]6.3563,[533]6.3585,[534]6.3569,[535]6.3592,[536]6.3539,[537]6.3515,[538]6.3569,[539]6.3578,[540]6.3614,[541]6.3618,[542]6.3621,[543]6.3638,[544]6.3647,[545]6.3628,[546]6.3636,[547]6.3594,[548]6.3543,[549]6.3539,[550]6.3512,[551]6.3473,[552]6.3448,[553]6.3410,[554]6.3387,[555]6.3355,[556]6.3353,[557]6.3380,[558]6.3343,[559]6.3341,[560]6.3339,[561]6.3345,[562]6.3320,[563]6.3317,[564]6.3366,[565]6.3388,[566]6.3388,[567]6.3369,[568]6.3372,[569]6.3357,[570]6.3383,[571]6.3387,[572]6.3392,[573]6.3384,[574]6.3348,[575]6.3345,[576]6.3344,[577]6.3327,[578]6.3302,[579]6.3306,[580]6.3240,[581]6.3203,[582]6.3194,[583]6.3202,[584]6.3203,[585]6.3124,[586]6.3054,[587]6.3062,[588]6.3110,[589]6.3166,[590]6.3199,[591]6.3219,[592]6.3208,[593]6.3171,[594]6.3180,[595]6.3155,[596]6.3190,[597]6.3167,[598]6.3142,[599]6.3164,[600]6.3163,[601]6.3150,[602]6.3170,[603]6.3196,[604]6.3205,[605]6.3244,[606]6.3266,[607]6.3253,[608]6.3216,[609]6.3219,[610]6.3256,[611]6.3240,[612]6.3267,[613]6.3229,[614]6.3181,[615]6.3103,[616]6.3132,[617]6.3069,[618]6.3019,[619]6.2962,[620]6.2818,[621]6.2747,[622]6.2730,[623]6.2747,[624]6.2752,[625]6.2751,[626]6.2743,[627]6.2769,[628]6.2770,[629]6.2766,[630]6.2796,[631]6.2854,[632]6.2914,[633]6.2898,[634]6.2935,[635]6.2940,[636]6.2904,[637]6.2870,[638]6.2900,[639]6.2867,[640]6.2879,[641]6.2880,[642]6.2946,[643]6.2966,[644]6.2980,[645]6.2963,[646]6.3009,[647]6.2971,[648]6.2984,[649]6.2985,[650]6.3024,[651]6.3077,[652]6.3087,[653]6.3129,[654]6.3063,[655]6.3056,

@slaren slaren requested a review from ggerganov March 27, 2023 10:15
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the conflicts

Please resolve and merge

@slaren slaren force-pushed the avx2-quantize-q4_1 branch from ae08d8e to e296529 Compare March 28, 2023 17:41
@slaren
Copy link
Member Author

slaren commented Mar 28, 2023

Rebased to master.

@slaren slaren force-pushed the avx2-quantize-q4_1 branch from 3125ea0 to 41669f6 Compare March 28, 2023 17:49
@slaren
Copy link
Member Author

slaren commented Mar 28, 2023

The bot almost got it right, the purpose of using the reference implementation in ggml_quantize_q4_1 is to ensure the accuracy when quantizing the model.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@ggerganov ggerganov merged commit 2a98bc1 into ggml-org:master Mar 28, 2023
@slaren
Copy link
Member Author

slaren commented Mar 28, 2023

Ironically, after the changes to master I am seeing slightly lower perplexity with the AVX path in the first chunks.

master:
[1]4.5870,[2]5.0477,[3]5.9136,[4]6.5310,[5]6.6497,

avx2:
[1]4.5671,[2]5.0153,[3]5.8921,[4]6.4689,[5]6.5678,

🤷‍♂️

@ggerganov
Copy link
Member

I guess we must be doing something right 🦙

@slaren slaren deleted the avx2-quantize-q4_1 branch March 28, 2023 18:44
Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request generation quality Quality of model output hardware Hardware related performance Speed related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants