Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SIMD implementation of ggml_compute_forward_rms_norm_f32 #450

Closed
wants to merge 1 commit into from

Conversation

slaren
Copy link
Member

@slaren slaren commented Mar 24, 2023

Using the GGML SIMD macros so hopefully it should work on different architectures, but only tested with AVX 2.

Don't expect any meaningful performance improvement, the function is not very hot.

Perplexity after this change (7B, q4_0): 6.5980

Full run output

./main -m ./models/7B/ggml-model-q4_0.bin --perplexity -t 12 -f wikitext-2-raw/wiki.test.raw
main: seed = 1679622068
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
44.42 seconds per pass - ETA 8.08 hours
[1]4.7114,[2]5.1906,[3]6.0631,[4]6.7550,[5]6.8446,[6]6.8386,[7]7.0409,[8]7.1527,[9]7.5598,[10]7.7969,[11]8.0295,[12]8.0414,[13]7.9639,[14]8.0505,[15]8.3085,[16]7.8944,[17]7.7598,[18]7.7382,[19]7.3351,[20]7.3199,[21]7.2169,[22]7.0259,[23]6.9849,[24]6.8973,[25]6.9077,[26]6.7192,[27]6.5322,[28]6.4300,[29]6.3396,[30]6.1645,[31]6.1348,[32]6.1517,[33]6.0782,[34]6.1209,[35]6.1473,[36]6.1898,[37]6.1899,[38]6.2031,[39]6.2446,[40]6.3025,[41]6.3138,[42]6.3502,[43]6.3025,[44]6.3605,[45]6.3612,[46]6.3311,[47]6.3573,[48]6.3292,[49]6.3372,[50]6.2981,[51]6.2917,[52]6.2815,[53]6.3305,[54]6.3116,[55]6.2910,[56]6.3316,[57]6.3587,[58]6.3818,[59]6.3926,[60]6.4389,[61]6.4274,[62]6.4922,[63]6.5292,[64]6.5446,[65]6.5959,[66]6.6112,[67]6.6276,[68]6.6449,[69]6.6738,[70]6.7064,[71]6.7333,[72]6.7657,[73]6.8332,[74]6.8364,[75]6.8536,[76]6.8687,[77]6.8844,[78]6.8698,[79]6.8997,[80]6.8911,[81]6.9051,[82]6.9124,[83]6.8562,[84]6.8423,[85]6.8377,[86]6.8144,[87]6.7518,[88]6.7245,[89]6.7046,[90]6.6873,[91]6.7184,[92]6.7128,[93]6.7157,[94]6.7149,[95]6.7477,[96]6.7466,[97]6.7416,[98]6.7350,[99]6.7177,[100]6.7154,[101]6.7402,[102]6.7337,[103]6.7586,[104]6.7642,[105]6.7633,[106]6.7790,[107]6.7792,[108]6.7870,[109]6.7803,[110]6.7725,[111]6.7956,[112]6.8164,[113]6.8175,[114]6.8138,[115]6.8237,[116]6.8156,[117]6.8208,[118]6.8504,[119]6.8723,[120]6.9123,[121]6.9303,[122]6.9570,[123]6.9971,[124]7.0147,[125]7.0044,[126]7.0463,[127]7.0859,[128]7.1153,[129]7.0964,[130]7.1063,[131]7.1002,[132]7.0924,[133]7.0800,[134]7.0907,[135]7.0879,[136]7.0736,[137]7.0657,[138]7.0504,[139]7.0392,[140]7.0361,[141]7.0095,[142]7.0062,[143]6.9775,[144]6.9563,[145]6.9482,[146]6.9339,[147]6.9402,[148]6.9417,[149]6.9379,[150]6.9329,[151]6.9353,[152]6.9261,[153]6.9078,[154]6.8983,[155]6.9044,[156]6.9000,[157]6.9182,[158]6.9210,[159]6.9267,[160]6.9309,[161]6.9435,[162]6.9103,[163]6.8968,[164]6.8695,[165]6.8349,[166]6.8041,[167]6.7628,[168]6.7285,[169]6.7135,[170]6.7002,[171]6.6698,[172]6.6515,[173]6.6329,[174]6.5997,[175]6.5754,[176]6.5626,[177]6.5421,[178]6.5181,[179]6.5002,[180]6.4903,[181]6.4657,[182]6.4461,[183]6.4302,[184]6.4303,[185]6.4221,[186]6.4243,[187]6.4300,[188]6.4266,[189]6.4453,[190]6.4473,[191]6.4681,[192]6.4844,[193]6.5029,[194]6.5150,[195]6.5373,[196]6.5548,[197]6.5795,[198]6.5967,[199]6.5986,[200]6.6019,[201]6.5977,[202]6.6201,[203]6.6265,[204]6.6283,[205]6.6391,[206]6.6467,[207]6.6421,[208]6.6507,[209]6.6569,[210]6.6620,[211]6.6745,[212]6.6828,[213]6.6932,[214]6.6982,[215]6.7015,[216]6.7164,[217]6.7344,[218]6.7478,[219]6.7485,[220]6.7446,[221]6.7388,[222]6.7352,[223]6.7233,[224]6.7170,[225]6.7116,[226]6.7334,[227]6.7449,[228]6.7513,[229]6.7583,[230]6.7549,[231]6.7718,[232]6.7580,[233]6.7394,[234]6.7227,[235]6.7078,[236]6.6991,[237]6.6887,[238]6.6921,[239]6.6745,[240]6.6629,[241]6.6677,[242]6.6716,[243]6.6695,[244]6.6565,[245]6.6527,[246]6.6399,[247]6.6273,[248]6.6190,[249]6.6171,[250]6.6223,[251]6.6147,[252]6.6105,[253]6.6001,[254]6.5960,[255]6.5826,[256]6.5625,[257]6.5502,[258]6.5418,[259]6.5396,[260]6.5309,[261]6.5267,[262]6.5210,[263]6.5152,[264]6.4968,[265]6.4964,[266]6.4952,[267]6.4883,[268]6.5001,[269]6.4982,[270]6.4991,[271]6.5069,[272]6.5119,[273]6.5104,[274]6.5121,[275]6.5214,[276]6.5264,[277]6.5448,[278]6.5559,[279]6.5643,[280]6.5684,[281]6.5794,[282]6.5856,[283]6.6008,[284]6.6084,[285]6.6175,[286]6.6323,[287]6.6317,[288]6.6380,[289]6.6283,[290]6.6129,[291]6.5968,[292]6.5802,[293]6.5651,[294]6.5679,[295]6.5664,[296]6.5702,[297]6.5689,[298]6.5720,[299]6.5691,[300]6.5572,[301]6.5569,[302]6.5489,[303]6.5398,[304]6.5298,[305]6.5279,[306]6.5141,[307]6.5170,[308]6.5205,[309]6.5038,[310]6.4971,[311]6.4907,[312]6.4925,[313]6.4870,[314]6.4867,[315]6.4689,[316]6.4651,[317]6.4476,[318]6.4244,[319]6.4370,[320]6.4510,[321]6.4545,[322]6.4497,[323]6.4427,[324]6.4403,[325]6.4499,[326]6.4502,[327]6.4524,[328]6.4572,[329]6.4636,[330]6.4661,[331]6.4793,[332]6.4760,[333]6.4837,[334]6.4771,[335]6.4699,[336]6.4734,[337]6.4695,[338]6.4683,[339]6.4627,[340]6.4580,[341]6.4661,[342]6.4688,[343]6.4744,[344]6.4746,[345]6.4746,[346]6.4714,[347]6.4760,[348]6.4803,[349]6.4819,[350]6.4784,[351]6.4788,[352]6.4786,[353]6.4732,[354]6.4738,[355]6.4794,[356]6.4822,[357]6.4782,[358]6.4877,[359]6.4910,[360]6.4871,[361]6.4870,[362]6.4944,[363]6.5066,[364]6.5130,[365]6.5190,[366]6.5199,[367]6.5286,[368]6.5261,[369]6.5274,[370]6.5282,[371]6.5219,[372]6.5270,[373]6.5327,[374]6.5307,[375]6.5295,[376]6.5383,[377]6.5327,[378]6.5349,[379]6.5415,[380]6.5315,[381]6.5270,[382]6.5203,[383]6.5184,[384]6.5177,[385]6.5165,[386]6.5160,[387]6.5152,[388]6.5104,[389]6.5041,[390]6.4966,[391]6.4883,[392]6.4841,[393]6.4827,[394]6.4852,[395]6.4835,[396]6.4754,[397]6.4834,[398]6.4876,[399]6.4972,[400]6.4971,[401]6.4985,[402]6.4995,[403]6.5011,[404]6.5080,[405]6.4991,[406]6.4954,[407]6.4952,[408]6.4959,[409]6.5087,[410]6.5207,[411]6.5333,[412]6.5502,[413]6.5618,[414]6.5694,[415]6.5756,[416]6.5837,[417]6.5978,[418]6.6021,[419]6.6099,[420]6.6196,[421]6.6315,[422]6.6367,[423]6.6440,[424]6.6563,[425]6.6661,[426]6.6733,[427]6.6780,[428]6.6866,[429]6.6908,[430]6.7000,[431]6.7147,[432]6.7190,[433]6.7174,[434]6.7120,[435]6.7125,[436]6.7151,[437]6.7250,[438]6.7328,[439]6.7290,[440]6.7280,[441]6.7224,[442]6.7208,[443]6.7218,[444]6.7228,[445]6.7206,[446]6.7228,[447]6.7260,[448]6.7301,[449]6.7274,[450]6.7279,[451]6.7233,[452]6.7121,[453]6.7037,[454]6.6979,[455]6.6985,[456]6.7033,[457]6.7055,[458]6.7031,[459]6.7040,[460]6.7135,[461]6.7108,[462]6.7094,[463]6.7146,[464]6.7136,[465]6.7108,[466]6.7027,[467]6.7036,[468]6.7038,[469]6.7060,[470]6.7068,[471]6.7016,[472]6.7068,[473]6.7008,[474]6.7029,[475]6.6973,[476]6.6996,[477]6.6927,[478]6.6921,[479]6.6996,[480]6.7051,[481]6.7071,[482]6.7030,[483]6.6990,[484]6.7020,[485]6.7000,[486]6.6941,[487]6.6946,[488]6.6929,[489]6.6875,[490]6.6843,[491]6.6814,[492]6.6756,[493]6.6726,[494]6.6708,[495]6.6707,[496]6.6675,[497]6.6622,[498]6.6607,[499]6.6553,[500]6.6454,[501]6.6386,[502]6.6379,[503]6.6377,[504]6.6279,[505]6.6308,[506]6.6315,[507]6.6257,[508]6.6216,[509]6.6201,[510]6.6240,[511]6.6291,[512]6.6326,[513]6.6340,[514]6.6411,[515]6.6351,[516]6.6339,[517]6.6345,[518]6.6341,[519]6.6370,[520]6.6400,[521]6.6418,[522]6.6447,[523]6.6456,[524]6.6526,[525]6.6567,[526]6.6575,[527]6.6597,[528]6.6540,[529]6.6548,[530]6.6494,[531]6.6476,[532]6.6525,[533]6.6549,[534]6.6524,[535]6.6553,[536]6.6498,[537]6.6471,[538]6.6526,[539]6.6534,[540]6.6576,[541]6.6587,[542]6.6596,[543]6.6612,[544]6.6628,[545]6.6606,[546]6.6614,[547]6.6565,[548]6.6504,[549]6.6504,[550]6.6469,[551]6.6433,[552]6.6412,[553]6.6366,[554]6.6341,[555]6.6310,[556]6.6309,[557]6.6336,[558]6.6299,[559]6.6293,[560]6.6289,[561]6.6291,[562]6.6272,[563]6.6273,[564]6.6315,[565]6.6336,[566]6.6331,[567]6.6310,[568]6.6312,[569]6.6295,[570]6.6322,[571]6.6327,[572]6.6333,[573]6.6334,[574]6.6299,[575]6.6300,[576]6.6299,[577]6.6288,[578]6.6262,[579]6.6271,[580]6.6203,[581]6.6161,[582]6.6151,[583]6.6154,[584]6.6156,[585]6.6077,[586]6.6004,[587]6.6010,[588]6.6060,[589]6.6120,[590]6.6149,[591]6.6167,[592]6.6153,[593]6.6115,[594]6.6123,[595]6.6099,[596]6.6143,[597]6.6116,[598]6.6080,[599]6.6104,[600]6.6103,[601]6.6091,[602]6.6112,[603]6.6140,[604]6.6152,[605]6.6187,[606]6.6208,[607]6.6196,[608]6.6157,[609]6.6167,[610]6.6204,[611]6.6183,[612]6.6212,[613]6.6174,[614]6.6119,[615]6.6039,[616]6.6071,[617]6.6007,[618]6.5950,[619]6.5888,[620]6.5736,[621]6.5660,[622]6.5641,[623]6.5657,[624]6.5661,[625]6.5661,[626]6.5648,[627]6.5672,[628]6.5677,[629]6.5671,[630]6.5709,[631]6.5772,[632]6.5829,[633]6.5810,[634]6.5845,[635]6.5851,[636]6.5819,[637]6.5787,[638]6.5816,[639]6.5786,[640]6.5796,[641]6.5797,[642]6.5868,[643]6.5890,[644]6.5903,[645]6.5879,[646]6.5922,[647]6.5892,[648]6.5903,[649]6.5902,[650]6.5939,[651]6.5999,[652]6.6006,[653]6.6051,[654]6.5984,[655]6.5980,

@slaren slaren marked this pull request as ready for review March 24, 2023 12:27
@ggerganov
Copy link
Member

I think if the performance does not change it is not worth making the code too cumbersome.
Will think about this some more and maybe merge at a later stage

@anzz1 anzz1 added enhancement New feature or request performance Speed related topics hardware Hardware related labels Mar 27, 2023
@slaren
Copy link
Member Author

slaren commented Apr 29, 2023

Closing for now since it doesn't look like this is going to be useful any time soon, there are many other ops that would be more important to optimize than this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hardware Hardware related performance Speed related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants