Replies: 57 comments 161 replies
-
perplexity run post c-api refactor merge to verify result: 6.5990
|
Beta Was this translation helpful? Give feedback.
-
Be sure you know exactly what kind of GPTQ quantization you have because the old .pt files don't have the "groupe size 128" if I remember well |
Beta Was this translation helpful? Give feedback.
-
I've been running some perplexity tests on my Q4_1 acceleration fork (see also this post), and noticed that the scores for the first few batches were worse. On 7B, it made a difference of as much as 0.1 at some point (usually closer to 0.05). I then swapped around the order of the two I suspect that we are dealing with numerical issues here. Indeed, with some debug output, it becomes clear that for some tensors the accumulator hits in excess of 10^4 magnitude while several blocks of summands are 10^{-3}. At float32 precision, the mantissa has about 7 significant decimal digits, so we're clearly actually hitting the spot where arbitrarily many summands could be functionally ignored. On this hunch, I tried to do a simple stability-improving transformation where I cut the loop computing the dot product in half, summed the first and second halves separately and then finally summed them together. This produced what I think is approximately the best-looking 7B wikitext block scores yet: [1]4.5225,[2]4.9974,[3]5.8552,[4]6.4904,[5]6.6052 (To compare, Q4_1 in master is cited as [1]4.4880,[2]4.9980,[3]5.9143.) I think this is a problem we should take seriously, considering that these discrepancies are only an order of magnitude or so off from the perplexity benefits of Q4_1 vs. Q4_0. I don't know if the current split in two is optimal, or we can do better with, say, splitting in four. In fact, how do "professional" and GPU matrix multiplication solutions handle this? Implementing a dot product as a linear loop accumulation seems bound to run into this problem; naively you probably want something closer to a binary tree for reducing your sum. |
Beta Was this translation helpful? Give feedback.
-
Guys I find someone who already did the tests https://github.com/IST-DASLab/gptq |
Beta Was this translation helpful? Give feedback.
-
OK, got the numbers for 65B q4_1 - 3.6188. Full run info (from a M1 Ultra): system_info: n_threads = 8 / 20 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | |
Beta Was this translation helpful? Give feedback.
-
Perplexity score for 13B f16 - 5.2455 13B f16 raw data[1]3.6920,[2]4.1502,[3]4.9227,[4]5.3138,[5]5.4988,[6]5.4418,[7]5.5892,[8]5.7035,[9]5.9589,[10]6.1779,[11]6.3594,[12]6.4056,[13]6.3646,[14]6.4525,[15]6.6488,[16]6.3378,[17]6.2593,[18]6.2369,[19]5.9537,[20]5.9339,[21]5.8613,[22]5.6905,[23]5.6637,[24]5.5727,[25]5.5836,[26]5.4377,[27]5.2660,[28]5.1678,[29]5.0918,[30]4.9584,[31]4.9168,[32]4.9304,[33]4.8871,[34]4.9276,[35]4.9463,[36]4.9698,[37]4.9619,[38]4.9593,[39]4.9869,[40]5.0272,[41]5.0495,[42]5.0830,[43]5.0490,[44]5.0919,[45]5.0944,[46]5.0694,[47]5.0982,[48]5.0816,[49]5.0833,[50]5.0530,[51]5.0610,[52]5.0542,[53]5.0999,[54]5.0905,[55]5.0723,[56]5.0915,[57]5.1093,[58]5.1312,[59]5.1492,[60]5.1848,[61]5.1786,[62]5.2323,[63]5.2568,[64]5.2683,[65]5.3041,[66]5.3034,[67]5.3211,[68]5.3333,[69]5.3604,[70]5.3899,[71]5.4117,[72]5.4453,[73]5.4921,[74]5.4992,[75]5.5082,[76]5.5223,[77]5.5336,[78]5.5201,[79]5.5462,[80]5.5412,[81]5.5490,[82]5.5459,[83]5.5012,[84]5.4898,[85]5.4834,[86]5.4685,[87]5.4029,[88]5.3581,[89]5.3366,[90]5.3268,[91]5.3474,[92]5.3433,[93]5.3451,[94]5.3450,[95]5.3714,[96]5.3681,[97]5.3649,[98]5.3611,[99]5.3541,[100]5.3514,[101]5.3747,[102]5.3704,[103]5.3863,[104]5.3905,[105]5.3922,[106]5.4062,[107]5.4049,[108]5.4198,[109]5.4190,[110]5.4137,[111]5.4315,[112]5.4479,[113]5.4473,[114]5.4460,[115]5.4502,[116]5.4385,[117]5.4379,[118]5.4618,[119]5.4799,[120]5.5083,[121]5.5233,[122]5.5451,[123]5.5813,[124]5.5987,[125]5.5938,[126]5.6289,[127]5.6610,[128]5.6888,[129]5.6772,[130]5.6855,[131]5.6816,[132]5.6779,[133]5.6658,[134]5.6742,[135]5.6741,[136]5.6658,[137]5.6622,[138]5.6486,[139]5.6409,[140]5.6399,[141]5.6127,[142]5.6087,[143]5.5837,[144]5.5680,[145]5.5591,[146]5.5482,[147]5.5533,[148]5.5563,[149]5.5531,[150]5.5525,[151]5.5572,[152]5.5516,[153]5.5421,[154]5.5365,[155]5.5429,[156]5.5409,[157]5.5565,[158]5.5581,[159]5.5589,[160]5.5626,[161]5.5734,[162]5.5487,[163]5.5393,[164]5.5191,[165]5.4943,[166]5.4717,[167]5.4404,[168]5.4139,[169]5.4008,[170]5.3919,[171]5.3718,[172]5.3599,[173]5.3475,[174]5.3211,[175]5.3012,[176]5.2880,[177]5.2717,[178]5.2520,[179]5.2393,[180]5.2323,[181]5.2162,[182]5.2000,[183]5.1881,[184]5.1872,[185]5.1803,[186]5.1812,[187]5.1867,[188]5.1842,[189]5.2003,[190]5.2006,[191]5.2174,[192]5.2312,[193]5.2457,[194]5.2567,[195]5.2754,[196]5.2871,[197]5.3058,[198]5.3190,[199]5.3209,[200]5.3213,[201]5.3145,[202]5.3270,[203]5.3325,[204]5.3275,[205]5.3359,[206]5.3410,[207]5.3371,[208]5.3427,[209]5.3461,[210]5.3518,[211]5.3621,[212]5.3684,[213]5.3776,[214]5.3803,[215]5.3835,[216]5.3955,[217]5.4120,[218]5.4255,[219]5.4255,[220]5.4229,[221]5.4184,[222]5.4185,[223]5.4122,[224]5.4057,[225]5.4022,[226]5.4219,[227]5.4267,[228]5.4340,[229]5.4410,[230]5.4371,[231]5.4522,[232]5.4419,[233]5.4273,[234]5.4128,[235]5.3905,[236]5.3855,[237]5.3770,[238]5.3801,[239]5.3691,[240]5.3601,[241]5.3633,[242]5.3648,[243]5.3641,[244]5.3543,[245]5.3508,[246]5.3410,[247]5.3314,[248]5.3254,[249]5.3221,[250]5.3257,[251]5.3176,[252]5.3128,[253]5.3038,[254]5.2995,[255]5.2906,[256]5.2744,[257]5.2647,[258]5.2581,[259]5.2572,[260]5.2490,[261]5.2439,[262]5.2399,[263]5.2351,[264]5.2117,[265]5.2117,[266]5.2090,[267]5.2029,[268]5.2092,[269]5.2085,[270]5.2094,[271]5.2155,[272]5.2184,[273]5.2198,[274]5.2206,[275]5.2265,[276]5.2322,[277]5.2442,[278]5.2524,[279]5.2606,[280]5.2644,[281]5.2739,[282]5.2792,[283]5.2916,[284]5.3002,[285]5.3081,[286]5.3204,[287]5.3170,[288]5.3223,[289]5.3163,[290]5.3023,[291]5.2894,[292]5.2763,[293]5.2645,[294]5.2652,[295]5.2654,[296]5.2699,[297]5.2690,[298]5.2710,[299]5.2688,[300]5.2602,[301]5.2605,[302]5.2543,[303]5.2461,[304]5.2389,[305]5.2364,[306]5.2260,[307]5.2290,[308]5.2298,[309]5.2168,[310]5.2141,[311]5.2098,[312]5.2113,[313]5.2059,[314]5.2043,[315]5.1917,[316]5.1873,[317]5.1749,[318]5.1589,[319]5.1692,[320]5.1801,[321]5.1847,[322]5.1817,[323]5.1760,[324]5.1741,[325]5.1833,[326]5.1850,[327]5.1856,[328]5.1890,[329]5.1937,[330]5.1960,[331]5.2063,[332]5.2029,[333]5.2105,[334]5.2061,[335]5.2013,[336]5.2036,[337]5.2027,[338]5.2023,[339]5.1981,[340]5.1954,[341]5.2020,[342]5.2052,[343]5.2093,[344]5.2097,[345]5.2112,[346]5.2097,[347]5.2133,[348]5.2170,[349]5.2191,[350]5.2173,[351]5.2186,[352]5.2187,[353]5.2137,[354]5.2144,[355]5.2192,[356]5.2222,[357]5.2193,[358]5.2272,[359]5.2293,[360]5.2260,[361]5.2258,[362]5.2326,[363]5.2434,[364]5.2485,[365]5.2523,[366]5.2542,[367]5.2628,[368]5.2608,[369]5.2622,[370]5.2642,[371]5.2604,[372]5.2651,[373]5.2692,[374]5.2674,[375]5.2671,[376]5.2728,[377]5.2695,[378]5.2721,[379]5.2759,[380]5.2692,[381]5.2662,[382]5.2625,[383]5.2607,[384]5.2607,[385]5.2595,[386]5.2583,[387]5.2581,[388]5.2551,[389]5.2516,[390]5.2464,[391]5.2409,[392]5.2374,[393]5.2370,[394]5.2402,[395]5.2395,[396]5.2344,[397]5.2409,[398]5.2452,[399]5.2521,[400]5.2514,[401]5.2521,[402]5.2531,[403]5.2555,[404]5.2610,[405]5.2458,[406]5.2415,[407]5.2404,[408]5.2414,[409]5.2524,[410]5.2614,[411]5.2707,[412]5.2846,[413]5.2947,[414]5.3007,[415]5.3066,[416]5.3137,[417]5.3231,[418]5.3254,[419]5.3301,[420]5.3378,[421]5.3475,[422]5.3509,[423]5.3564,[424]5.3652,[425]5.3726,[426]5.3786,[427]5.3826,[428]5.3897,[429]5.3933,[430]5.3994,[431]5.4119,[432]5.4150,[433]5.4143,[434]5.4111,[435]5.4124,[436]5.4153,[437]5.4234,[438]5.4307,[439]5.4280,[440]5.4275,[441]5.4232,[442]5.4221,[443]5.4231,[444]5.4248,[445]5.4241,[446]5.4261,[447]5.4284,[448]5.4315,[449]5.4300,[450]5.4311,[451]5.4283,[452]5.4127,[453]5.4031,[454]5.3975,[455]5.3978,[456]5.4018,[457]5.4029,[458]5.4012,[459]5.4008,[460]5.4080,[461]5.4037,[462]5.4000,[463]5.3977,[464]5.3974,[465]5.3952,[466]5.3877,[467]5.3863,[468]5.3841,[469]5.3851,[470]5.3839,[471]5.3789,[472]5.3792,[473]5.3746,[474]5.3732,[475]5.3664,[476]5.3637,[477]5.3551,[478]5.3522,[479]5.3521,[480]5.3541,[481]5.3541,[482]5.3495,[483]5.3454,[484]5.3461,[485]5.3392,[486]5.3327,[487]5.3315,[488]5.3292,[489]5.3238,[490]5.3206,[491]5.3172,[492]5.3105,[493]5.3076,[494]5.3058,[495]5.3035,[496]5.2997,[497]5.2934,[498]5.2907,[499]5.2871,[500]5.2793,[501]5.2722,[502]5.2710,[503]5.2700,[504]5.2624,[505]5.2621,[506]5.2627,[507]5.2573,[508]5.2538,[509]5.2544,[510]5.2565,[511]5.2607,[512]5.2647,[513]5.2672,[514]5.2725,[515]5.2687,[516]5.2678,[517]5.2679,[518]5.2679,[519]5.2701,[520]5.2714,[521]5.2725,[522]5.2739,[523]5.2745,[524]5.2799,[525]5.2827,[526]5.2831,[527]5.2847,[528]5.2795,[529]5.2803,[530]5.2767,[531]5.2764,[532]5.2811,[533]5.2837,[534]5.2817,[535]5.2838,[536]5.2796,[537]5.2778,[538]5.2828,[539]5.2835,[540]5.2850,[541]5.2846,[542]5.2860,[543]5.2881,[544]5.2894,[545]5.2884,[546]5.2888,[547]5.2856,[548]5.2815,[549]5.2815,[550]5.2795,[551]5.2770,[552]5.2751,[553]5.2722,[554]5.2700,[555]5.2681,[556]5.2673,[557]5.2689,[558]5.2656,[559]5.2659,[560]5.2645,[561]5.2647,[562]5.2623,[563]5.2621,[564]5.2663,[565]5.2674,[566]5.2681,[567]5.2662,[568]5.2672,[569]5.2658,[570]5.2684,[571]5.2696,[572]5.2705,[573]5.2710,[574]5.2680,[575]5.2663,[576]5.2656,[577]5.2640,[578]5.2621,[579]5.2620,[580]5.2569,[581]5.2542,[582]5.2542,[583]5.2551,[584]5.2558,[585]5.2499,[586]5.2447,[587]5.2450,[588]5.2493,[589]5.2541,[590]5.2572,[591]5.2589,[592]5.2579,[593]5.2540,[594]5.2552,[595]5.2536,[596]5.2576,[597]5.2557,[598]5.2525,[599]5.2551,[600]5.2542,[601]5.2531,[602]5.2530,[603]5.2556,[604]5.2562,[605]5.2588,[606]5.2602,[607]5.2587,[608]5.2559,[609]5.2568,[610]5.2608,[611]5.2596,[612]5.2618,[613]5.2591,[614]5.2552,[615]5.2495,[616]5.2520,[617]5.2471,[618]5.2429,[619]5.2386,[620]5.2280,[621]5.2231,[622]5.2213,[623]5.2226,[624]5.2231,[625]5.2239,[626]5.2236,[627]5.2262,[628]5.2271,[629]5.2275,[630]5.2305,[631]5.2348,[632]5.2394,[633]5.2383,[634]5.2413,[635]5.2410,[636]5.2375,[637]5.2337,[638]5.2356,[639]5.2326,[640]5.2332,[641]5.2336,[642]5.2384,[643]5.2401,[644]5.2419,[645]5.2406,[646]5.2440,[647]5.2388,[648]5.2399,[649]5.2402,[650]5.2431,[651]5.2473,[652]5.2478,[653]5.2515,[654]5.2462,[655]5.2455, |
Beta Was this translation helpful? Give feedback.
-
I just started a run on a 65B gptq model but it looked noticeably worse (something like 3.4 vs 3.0 on the first couple iterations) and seemed like it wouldn't be worth running the full set on. Is the the gptq inference code solidified yet? If so, is there anyway to generate a gptq output w/o an Nvidia GPU (I'm not sure how up to date the quantization was on the model I have and I'd like to make sure it is a current version). |
Beta Was this translation helpful? Give feedback.
-
Out of curiosity I'm doing a 3 day run with the 5521 total chunks in |
Beta Was this translation helpful? Give feedback.
-
I also just ran a 30B q4_1 run last night. It finished at 4.2701. |
Beta Was this translation helpful? Give feedback.
-
I can run the 65B models again, but it doesn’t sound like anything should
change much since 2 days ago. I thought the changes were just to give a
little more stability to the runs and that the absolute scores shouldn’t
change much.
A single run on 65B takes about 13 or 14 hours so I’m not too eager to redo
them unless needed.
…On Fri, Mar 24, 2023 at 9:01 AM Erik Scholz ***@***.***> wrote:
I don't have enough disk space for the bigger models
i am sooo feeling that.
ping me if you want me to test 30B, i cant run the 65B though.
—
Reply to this email directly, view it on GitHub
<#406 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGUOXWCABABRKGVUHTUVQDW5XAPFANCNFSM6AAAAAAWEHG6B4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
OK, finished a run of 30B f16 (non-quantized - not with the --memory_f16 option) that can compare to the q4_1 - 4.1539. |
Beta Was this translation helpful? Give feedback.
-
FYI with the latest BLAS fixes, I believe that perplexity computations should be much faster with big batch sizes ( > 255) if you link against OpenBLAS: # on x86
make clean && LLAMA_OPENBLAS=1 make -j Let me know if this is true. # no BLAS, 7B
make clean && LLAMA_NO_ACCELERATE=1 make -j && ./main --perplexity -m ./models/7B/ggml-model-q4_0.bin -f build/wiki.test.raw -t 8
system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
21.11 seconds per pass - ETA 3.84 hours
# no BLAS, 13B
make clean && LLAMA_NO_ACCELERATE=1 make -j && ./main --perplexity -m ./models/13B/ggml-model-q4_0.bin -f build/wiki.test.raw -t 8
system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
41.39 seconds per pass - ETA 7.53 hours
# with BLAS, 7B
make clean && make -j && ./main --perplexity -m ./models/7B/ggml-model-q4_0.bin -f build/wiki.test.raw -t 8
system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
10.43 seconds per pass - ETA 1.90 hours
# with BLAS, 13B
make clean && make -j && ./main --perplexity -m ./models/13B/ggml-model-q4_0.bin -f build/wiki.test.raw -t 8
system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks
19.34 seconds per pass - ETA 3.52 hours So about x2 speed-up when using BLAS. |
Beta Was this translation helpful? Give feedback.
-
Does that mean it will also get faster on Apple Silicon now?
…On Sat, Mar 25, 2023 at 8:06 AM Georgi Gerganov ***@***.***> wrote:
FYI with the latest BLAS fixes, I believe that perplexity computations
should be much faster with big batch sizes ( > 255) if you link against
OpenBLAS
—
Reply to this email directly, view it on GitHub
<#406 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGUOXSCZGM2LDIDZLYCADLW54CXXANCNFSM6AAAAAAWEHG6B4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Test plan for my 16 core AMD Threadripper 1950X for the new few weeks while I'm away:My 3 day pre-BLAS run with What do people think is the most interesting to explore and / or technically feasible with 128GB of RAM? e.g.
or maybe?
|
Beta Was this translation helpful? Give feedback.
-
@ikawrakow Any chance you could post/link to the perplexity data you used to generate the graphs/tables for k-quants added in PR #1684? Right now there's really no information for models >13B and having that available would be really helpful even if it's only for the new quantizations. (I can deal with it in any format, doesn't need to be cleaned up or anything.) I'd do it myself but unfortunately my hardware falls far short of completing something like that in a reasonable amount of time. |
Beta Was this translation helpful? Give feedback.
-
OK, here is a table
I haven't done |
Beta Was this translation helpful? Give feedback.
-
@ikawrakow Thank you! Do you have the ones for 7B and 13B also? Sorry I wasn't clear, when I mentioned that only 7B and 13B were currently available I was talking about the section in the main README which only includes the non-k-quants quantizations. (I'd like to use this to generate other data/make comparisons so it's better if the source isn't an estimate.) |
Beta Was this translation helpful? Give feedback.
-
@KerfuffleV2 You need something different compared to what is already provided in the description of #1684? |
Beta Was this translation helpful? Give feedback.
-
Oh yeah, I forgot to post this but here is what I generated based on the information available. Note: I had to calculate the sizes for 33B and 65B based on the other models and I suspect it may not actually be correct so take the stats involving size with a huge chunk of salt. Here is the horrendous script that generated the below: https://gist.github.com/KerfuffleV2/d072237b4a9386e80cdc302f923843db (it started as a comprehension in the Python REPL so it never got a chance to be a real program) The figures came from the main README and ikawrakow's response here + k-quants PR. I didn't generate any of them myself, just manipulated them. Legend
edit: Manually generated, but for reference:
Based on full quality models. 7B
13B
33B
65B
|
Beta Was this translation helpful? Give feedback.
-
All, @ggerganov suggests that the standard test model be OpenLlama @SlyEcho mentions a "truncated wiki.test.raw perplexity test" Q1: If I was only doing 1 test... should it test on OpenLlama 3B, 7B or 13B? Q2: HOW do I do a "truncated perplexity test"? Q3: is i it likely that the above perplexity test will be a relevant comparison say in 3, 6, 9 months time? |
Beta Was this translation helpful? Give feedback.
-
Open Llama 3B perplexity abbreviated test
|
Beta Was this translation helpful? Give feedback.
-
@ianscrivener I checked out your Azure CI brainstorming discussion and noted that you're looking at perplexity performance and not quality? The way perplexity uses llama.cpp looks to be distinctly different from interactive use, as can be seen from comparing your perplexity run above to the output of this run (no GPU):
|
Beta Was this translation helpful? Give feedback.
-
In response to @ggerganov's call for perplexity and latency testing for llama.cpp I've coded llama.cpp perplexity scorecard... a helper project to run and gather |
Beta Was this translation helpful? Give feedback.
-
Why not consider moving to a better and more used scoring method like HellaSwag? Added support for measurement of a HellaSwag-like score in PR #2312 and started a discussion in #2321. |
Beta Was this translation helpful? Give feedback.
-
Is there any place with updated perplexity scores for LLAMA2 and current codebase with GGUF? |
Beta Was this translation helpful? Give feedback.
-
Agree. I'd also like to see perplexity and Hellaswag scores, updated with each code release at least for a handful of llama2 gguf models. I'm really interested to see and understand the improvements over time - both in the code and the models. Ie quality and performance benchmarks over time. I've done preliminary code for this (both in python and node.js) after @ggerganov put out the call and said that sufficient Azure cloud resources would be soon available. I put an MLOps benchmarking roadmap to @ggerganov in an email but did not get a reply. There does not seem to be very much interest in this from the C++ developers. I'm off grid, accessing via 4G, and only have a Macbook Pro. So without access to the cloud GPU that Azure has given to the project I cannot do proceed. Personally, I'd like to see llama.cpp grow beyond just the (excellent) core C++ library, ie adding;
|
Beta Was this translation helpful? Give feedback.
-
Here are my cents:
For others to decide.
We announce a couple of supported models without documentation (or a link?) on how to convert and run. This looks (and probably is) bad.
Personally I'm quite interested.
I'd assume |
Beta Was this translation helpful? Give feedback.
-
Mistral 7b compared to other llamas: Q4_K_M: (If perplexity isn't a fair benchmarking tool, we can use hellaswag score for kquants) |
Beta Was this translation helpful? Give feedback.
-
Hi. |
Beta Was this translation helpful? Give feedback.
-
We are currently collecting Perplexity scores for all models + quantization + program flags. Use this discussion to Coordinate.
Mostly Default
./perplexity
settings with all ofwiki.test.raw
Results in italics are now being added / updated with BLAS enabled and using quantization as per PR #896. These results are collected from various sources and builds, so will contain inconsistencies and errors.
Note: Since the tokenizer used by FB Llama and Open Llama are different the following is not a valid inter-model comparison ~ @gjmulder:
Context sizes:
(512 | 1024 | 2048) ⨯ (7B | 13B | 30B | 65B) ⨯ (llama | alpaca[-lora] | vicuna-GPTQ)
models, first 406 lines ofwiki.test.raw
:Google GSheet with comments enabled.
I appreciate that alpaca models aren't generative in intent, and so perplexity is not a good measure. However, I was curious to see the trade-off in perplexity for the chat-like models - @gjmulder
History
Feel free to make a new thread in this discussion when you take a measurement, or want to "donate" some compute time.
(@gjmulder @glinscott, et al feel free to make edits to this post)
Beta Was this translation helpful? Give feedback.
All reactions