Raspberry Pi 4 4GB #58

miolini · 2023-03-12T18:33:40Z

Hi!

Just a report. I've successfully run the LLaMA 7B model on my 4GB RAM Raspberry Pi 4. It's super slow at about 10 sec/token. But it looks like we can run powerful cognitive pipelines on a cheap hardware. It's awesome. Thank you!

Hardware : BCM2835
Revision : c03111
Serial : 10000000d62b612e
Model : Raspberry Pi 4 Model B Rev 1.1

%Cpu0 : 71.8 us, 14.6 sy, 0.0 ni, 0.0 id, 2.9 wa, 0.0 hi, 10.7 si, 0.0 st
%Cpu1 : 77.4 us, 12.3 sy, 0.0 ni, 0.0 id, 10.4 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 81.0 us, 8.6 sy, 0.0 ni, 0.0 id, 10.5 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 77.1 us, 12.4 sy, 0.0 ni, 1.0 id, 9.5 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3792.3 total, 76.2 free, 3622.9 used, 93.2 buff/cache
MiB Swap: 65536.0 total, 60286.5 free, 5249.5 used. 42.1 avail Mem

PID      USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

2705518 ubuntu 20 0 5231516 3.3g 1904 R 339.6 88.3 84:16.70 main
102 root 20 0 0 0 0 S 14.2 0.0 29:54.42 kswapd0

main: seed = 1678644466
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291

main: prompt: 'The first man on the moon was '
main: number of tokens in prompt = 9
1 -> ''
1576 -> 'The'
937 -> ' first'
767 -> ' man'
373 -> ' on'
278 -> ' the'
18786 -> ' moon'
471 -> ' was'
29871 -> ' '

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

The first man on the moon was 20 years old and looked^[ lot like me. In fact, when I read about Neil Armstrong during school lessons my fa

The text was updated successfully, but these errors were encountered:

miolini · 2023-03-12T18:34:34Z

It looks it's possible to pack it to a AWS Lambda on ARM Gravitron + S3 weights offloading.

neuhaus · 2023-03-12T19:21:56Z

Is it swapping?
Instead of opening new issues perhaps these numbers should be collected in issue #34 ("benchmarks?").

miolini · 2023-03-12T19:24:17Z

@neuhaus kswapd0 process is pretty active

dalnk · 2023-03-13T04:17:51Z

I'm trying to run it on a chromebook but I've hit a segfault :(

one thing I may have done wrong is that I reused the same .bin I created on my mac and it worked there.

miolini · 2023-03-13T05:09:02Z

@dalnk Do you have swap in the system?

ekp1k80 · 2023-03-13T16:54:46Z

What do you change for be able to run it in the PI. i have a pc 4 times better and crash every i tried @miolini

MarkSchmidty · 2023-03-13T18:43:31Z

1.2 tokens/s on a Samsung S22 Ultra running 4 threads.

The S22 obviously has a more powerful processor. But I do not think it is 12 times more powerful. It's likely you could get much faster speeds on the Pi.

I'd be willing to bet that the bottleneck is not the processor.

miolini · 2023-03-13T18:45:07Z

@MarkSchmidty thank you for sharing your results. I believe my system swaped a lot due to limit size of RAM (4GB RAM, model size 4GB).

MarkSchmidty · 2023-03-13T18:50:54Z

Ah, yes. A 3-bit implementation of 7B would fit fully in 4GB of RAM and lead to much greater speeds. This is the same issue as in #97.

3-bit support is a proposed enhancement in GPTQ Quantization (3-bit and 4-bit) #9. GPTQ 3-bit has been shown to have negligible output quality vs uncompressed 16-bit and may even provide better output quality than the current naive 4-bit implementation in llama.cpp while requiring 25% less RAM.

miolini · 2023-03-13T18:53:17Z

@MarkSchmidty Fingers crossed!

Ronsor · 2023-03-14T05:20:22Z

I'm currently unable to build for aarch64 on an RPi 4 due to missing SIMD dot product intrinsics (vdotq_s32). Replacing them with abort() makes compilation complete but results in a crash at runtime. Changing -mcpu to include dotprod results in a runtime crash from an illegal instruction.

ronsor@ronsor-rpi4:~/llama.cpp $ lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
Vendor ID:                       ARM
Model:                           3
Model name:                      Cortex-A72
Stepping:                        r0p3
CPU max MHz:                     2000.0000
CPU min MHz:                     600.0000
BogoMIPS:                        108.00
L1d cache:                       128 KiB
L1i cache:                       192 KiB
L2 cache:                        1 MiB
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Vulnerable
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm crc32 cpuid```

miolini · 2023-03-14T05:52:26Z

@Ronsor Could you please share build log?

Ronsor · 2023-03-14T05:58:53Z

@Ronsor Could you please share build log?

log.txt

miolini · 2023-03-14T06:18:01Z

@Ronsor something wrong with your environment.
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: aarch64

My build log on RPI starts with:
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: aarch64
I UNAME_M: aarch64

vkopitsa · 2023-03-14T08:26:40Z

Raspberry Pi 3 Model B Rev 1.2, 1GB RAM + swap (5.7GB on microSD), also works but very slowly

Hardware	: BCM2835
Revision	: a22082
Serial		: 0000000086ed002f
Model		: Raspberry Pi 3 Model B Rev 1.2

Ronsor · 2023-03-14T11:44:34Z

@Ronsor something wrong with your environment. I UNAME_S: Linux I UNAME_P: unknown I UNAME_M: aarch64

My build log on RPI starts with: I llama.cpp build info: I UNAME_S: Linux I UNAME_P: aarch64 I UNAME_M: aarch64

Which distro are you using? I'm just on vanilla Raspberry Pi OS. It seems the vdotq change is the issue.

Ronsor · 2023-03-14T12:05:17Z

Readding the old dot product code fixed my issue

Ronsor · 2023-03-14T15:08:16Z

Now that I fixed that (I'll submit a PR soon), running on an 8GB Pi results in not-terrible performance:

main: seed = 1678806223
llama_model_load: loading model from 'models/llama-7B/ggml-model.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from 'models/llama-7B/ggml-model.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | 

main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


Building a website can be done in 10 simple steps:
Get your domain name. There are plenty of great places to register domains at

main: mem per token = 14368644 bytes
main:     load time = 16282.27 ms
main:   sample time =    35.33 ms
main:  predict time = 31862.14 ms / 1062.07 ms per token
main:    total time = 51951.39 ms

~1 token/sec

davidrutland · 2023-03-14T15:15:53Z

Hey @Ronsor I'm having the same issue. Could you say exactly what you did to fix ?

Ronsor · 2023-03-14T15:20:12Z

@davidrutland Basically undo commit 84d9015 and it should build fine

Mestrace · 2023-03-14T15:33:11Z

Not able to build with my Rpi 4 4GB running ubuntu 22.10

I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  aarch64
I UNAME_M:  aarch64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mcpu=native
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native
I LDFLAGS:  
I CC:       cc (Ubuntu 12.2.0-3ubuntu1) 12.2.0
I CXX:      g++ (Ubuntu 12.2.0-3ubuntu1) 12.2.0

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mcpu=native   -c ggml.c -o ggml.o
In file included from ggml.c:137:
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h: In function ‘ggml_vec_dot_q4_0’:
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h:29527:1: error: inlining failed in call to ‘always_inline’ ‘vdotq_s32’: target specific option mismatch
29527 | vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
      | ^~~~~~~~~
ggml.c:1368:15: note: called from here
 1368 |         p_1 = vdotq_s32(p_1, v0_1hs, v1_1hs);
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h:29527:1: error: inlining failed in call to ‘always_inline’ ‘vdotq_s32’: target specific option mismatch
29527 | vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
      | ^~~~~~~~~
ggml.c:1367:15: note: called from here

However, when removed changes added #67 related to vdotq_s32 I was able to build successfully. Just simply git revert the commit.

Mestrace · 2023-03-14T18:42:48Z

I trie to run convert-pth-to-ggml.py script on my Pi and found it always OOM. But running in Mac seems okay.

# convert the 7B model to ggml FP16 format
python3 convert-pth-to-ggml.py models/7B/ 1

~~Then I experienced same core dump issue as @dalnk described, after I copied the bin file generated from Mac to Pi.~~

Turns out I am using the fp16 model therefore it core dumped. It was resolved after I run the correct command.

I would suggest that we should note on the the usage in README that this step is platform agnostic and user should consider run this in a desktop device and copy over if they are running the model in the lower spec devices like Raspberry Pi. WDTY?

octoshrimpy · 2023-03-15T06:36:35Z

@Mestrace I am also getting a segfault core dump, but when quantizing on my desktop with plenty of ram available. what was the wrong and correct commands you ran in relation to the fp16 model?

MarkSchmidty · 2023-03-15T07:02:01Z

@octoshrimpy I believe Mestrace is saying you should convert and quantize the model on a desktop computer with a lot of RAM first, then move the ~4GB 4bit quantized mode to your pi.

octoshrimpy · 2023-03-15T07:29:21Z

@MarkSchmidty that is what I am attempting, haha. is 16GB ram free not enough for quantizing 7B?

This is what I'm running into, unsure where to go from here.

gjmulder · 2023-03-15T08:21:55Z

Run top and watch the ./quantize memory utilisation. Also keep an eye on your disk space.

Mestrace · 2023-03-15T12:30:41Z

@octoshrimpy What I did

Run convert ggml to get the f16 model weights on Mac
Copy the f16 model weights to Pi
quantize on Pi

octoshrimpy · 2023-03-15T16:18:31Z

@Mestrace what command did you use for quantizing?

octoshrimpy · 2023-03-15T16:24:23Z

@gjmulder I have 350G of space available, and plenty of ram. quantize immediately crashes with segfault, so there is no ram/disk utilization to view. are there logs I can check, or a INFO level logging I can enable?

miolini · 2023-03-15T17:39:46Z

I did everything on RPI 4. Just enable swap (8GB+) in your system.

octoshrimpy · 2023-03-15T19:31:45Z

@miolini i'm not using a pi for quantization. doing it on arch linux, with 16 gigs of ram, 10 gigs of swap, and barely anything else running (average 2 gigs used before I start)

stakeswky · 2023-03-17T02:47:10Z

I'm trying to run it on a chromebook but I've hit a segfault :(

one thing I may have done wrong is that I reused the same .bin I created on my mac and it worked there.

you must create a swap

vitouXY · 2023-03-19T14:51:53Z

Raspberry Pi Zero 2 W Rev 1.0 (aarch64)(RPiOS 64-bit)

pi@rpiz2:~/SOURCES/llama.cpp $ time ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128 -p 'The first man on the moon was '
main: seed = 1679152801
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 8 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |

main: prompt: ' The first man on the moon was '
main: number of tokens in prompt = 9
     1 -> ''
   450 -> ' The'
   937 -> ' first'
   767 -> ' man'
   373 -> ' on'
   278 -> ' the'
 18786 -> ' moon'
   471 -> ' was'
 29871 -> ' '

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


 The first man on the moon was 40 years old, and Neil Armstrong became a hero. I would have been 8 or so at that time; I do not know if we watched it live in India — probably no TV station bothered to cover such an event then (things were simpler). But there is one memory from those times which has remained with me: the sheer excitement and euphoria of a nation. There was jubilation, joy…
And that was not just because we got our first man in space — it’s what people said on TV or read about Armstrong then; they seemed to like him more for being American

main: mem per token = 14434244 bytes
main:     load time = 547182.62 ms
main:   sample time = 117915.49 ms
main:  predict time = 79002344.00 ms / 580899.56 ms per token
main:    total time = 80388128.00 ms

real    1386m9.654s
user    742m3.072s
sys     93m4.528s
pi@rpiz2:~/SOURCES/llama.cpp $

... 10 years per token. Do not ask me.

MarkSchmidty · 2023-03-19T19:41:29Z

Yes, if you don't have enough ram it's going to be too slow to be useful.

There are other models useful for interesting projects for devices with less ram.

whisper.cpp only needs 64MB of RAM for the smallest model and 1GB for the largest, for example.

msyyces8x95 · 2023-03-20T10:34:31Z

I tried to run it on a similar hardware as Rasberry PI.
it compiles but it doesn't run even if I have enough swap memory.
Any ideas ?

miolini · 2023-03-20T13:11:09Z

Could you please share your build log and output of command uname -a?

msyyces8x95 · 2023-03-20T15:50:38Z

Could you please share your build log and output of command uname -a?

Never mind it worked for me, I had to use the old school file based swap instead of zram. (used this)

davidrimshnick · 2023-03-21T17:12:27Z

Seems like lines 1938 and 1939 of ggml.c should be changed from

            const int8x8_t vxlt = vzip1_s8(vxls, vxhs);
            const int8x8_t vxht = vzip2_s8(vxls, vxhs);

to

            const int8x8_t vxlt = vget_low_s8(vcombine_s8(vxls, vxhs));
            const int8x8_t vxht = vget_high_s8(vcombine_s8(vxls, vxhs));

HughMungis · 2023-03-21T18:35:37Z

I had to use the old school file based swap instead of zram.

doesn't this fry SD cards? maybe i'm not understanding this correctly

msyyces8x95 · 2023-03-21T19:55:25Z

I had to use the old school file based swap instead of zram.

doesn't this fry SD cards? maybe i'm not understanding this correctly

it does sadly, in my case it was for a PoC of 2hours not a lengthy usage so its fine.
if anyone knows how to make it work with zram instead of file swap would be nice.

Schmadel · 2023-03-25T14:35:35Z

This is awesome and on my 8GB Desktop it runs fine. But with 4GB i still not get it running. I change my swap file to 4GB i play with the parameters...My question is: does anybody know how the parameters : llama_model_load: memory_size = 1024.00 MB, n_mem = 65536 can be changed? in all screens i saw n_mem is about 16000 and memory size 512. And why is the ctx size so strange and not 4500MB?

Thank you so much!

JValtteri · 2023-04-06T11:55:01Z

I failed to build on Pi 4

I llama.cpp build info:
I UNAME_S:  Linux
I UNAME_P:  unknown
I UNAME_M:  armv7l
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread
I LDFLAGS:
I CC:       cc (Raspbian 8.3.0-6+rpi1) 8.3.0
I CXX:      g++ (Raspbian 8.3.0-6+rpi1) 8.3.0

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizations   -c ggml.c -o ggml.o
ggml.c: In function ‘quantize_row_q4_1’:
ggml.c:948:27: warning: implicit declaration of function ‘vminvq_f32’; did you mean ‘vminq_f32’? [-Wimplicit-function-declaration]
         const float min = vminvq_f32(minv[0]);
                           ^~~~~~~~~~
                           vminq_f32
ggml.c:949:27: warning: implicit declaration of function ‘vmaxvq_f32’; did you mean ‘vmaxq_f32’? [-Wimplicit-function-declaration]
         const float max = vmaxvq_f32(maxv[0]);
                           ^~~~~~~~~~
                           vmaxq_f32
ggml.c: In function ‘dequantize_row_q4_0’:
ggml.c:1035:35: warning: implicit declaration of function ‘vzip1_s8’; did you mean ‘vzipq_s8’? [-Wimplicit-function-declaration]
             const int8x8_t vx_0 = vzip1_s8(vb_0, vb_1);
                                   ^~~~~~~~
                                   vzipq_s8
ggml.c:1035:35: error: incompatible types when initializing type ‘int8x8_t’ using type ‘int’
ggml.c:1036:35: warning: implicit declaration of function ‘vzip2_s8’; did you mean ‘vzipq_s8’? [-Wimplicit-function-declaration]
             const int8x8_t vx_1 = vzip2_s8(vb_0, vb_1);
                                   ^~~~~~~~
                                   vzipq_s8
ggml.c:1036:35: error: incompatible types when initializing type ‘int8x8_t’ using type ‘int’
ggml.c: In function ‘dequantize_row_q4_1’:
ggml.c:1143:36: warning: implicit declaration of function ‘vzip1_u8’; did you mean ‘vzipq_u8’? [-Wimplicit-function-declaration]
             const uint8x8_t vx_0 = vzip1_u8(v0, v1);
                                    ^~~~~~~~
                                    vzipq_u8
ggml.c:1143:36: error: incompatible types when initializing type ‘uint8x8_t’ using type ‘int’
ggml.c:1144:36: warning: implicit declaration of function ‘vzip2_u8’; did you mean ‘vzipq_u8’? [-Wimplicit-function-declaration]
             const uint8x8_t vx_1 = vzip2_u8(v0, v1);
                                    ^~~~~~~~
                                    vzipq_u8
ggml.c:1144:36: error: incompatible types when initializing type ‘uint8x8_t’ using type ‘int’
ggml.c: In function ‘ggml_vec_dot_q4_1’:
ggml.c:2303:31: warning: implicit declaration of function ‘vaddvq_u8’; did you mean ‘vaddq_u8’? [-Wimplicit-function-declaration]
         sum01 += y0->m*x0->d*(vaddvq_u8(v0_0l) + vaddvq_u8(v0_0h));
                               ^~~~~~~~~
                               vaddq_u8
ggml.c:2305:30: warning: implicit declaration of function ‘vaddvq_u16’; did you mean ‘vaddq_u16’? [-Wimplicit-function-declaration]
         sum11 += x0->d*y0->d*vaddvq_u16(vaddq_u16(pl0, ph0));
                              ^~~~~~~~~~
                              vaddq_u16
make: *** [Makefile:143: ggml.o] Error 1

Rotvie · 2023-04-20T06:34:46Z

I also managed to run LLaMA 7B on a Raspberry Pi 4! I recorded a video of the process if anyone is interested (8:10 for the inference demo) Video

khengari77 · 2023-05-07T06:36:48Z

I managed to get it running on Rock Pi 4SE but as you mentioned it's super slow. I also managed to get it working with openCL but really it didn't make any difference.

kiranns · 2023-06-02T16:59:28Z

It looks it's possible to pack it to a AWS Lambda on ARM Gravitron + S3 weights offloading.

Did you succeed in running it on a AWS Lambda instance? If so, what memory size?

KV cache is now cyclic split into permuted V variant The ggml_tensor_print function has been completely reworked to output proper 1-4dim tensors with data. Example: ``` +======================+======================+======================+======================+ | :0 | V [f32 type] +----------------------+----------------------+----------------------+----------------------+ | Dimensions | Strides | Layer id | Backend | | 3 | 4x16x1024 | 0 | CPU | +----------------------+----------------------+----------------------+----------------------+ | Elements | Src0 | Src1 | Operation | | 4 x 64 x 2 | 4 x 64 x 2 | N/A | CONT | +----------------------+----------------------+----------------------+----------------------+ | Transposed: No | Permuted: No | Contiguous: Yes | Size: 0.00 MB | | Src0 name: | cache_v (view) (permuted) | +----------------------+----------------------+----------------------+----------------------+ +-------------------------------------------------------------------------------------------+ | Content of src0 "cache_v (view) (permuted)" (3 dim) +-------------------------------------------------------------------------------------------+ | Content of src0 "cache_v (view) (permuted)" (3 dim) | Total Elements : [ Row:4 Col:64 Layer:2 ] +-------------------------------------------------------------------------------------------+ | Row 1: [0.302 , 0.010 ] [-0.238 , 0.680 ] [0.305 , 0.206 ] [-0.013 , 0.436 ] [-0.074 , -0.698 ] [-0.153 , -0.067 ] | Row 2: [0.091 , 0.199 ] [0.253 , 0.151 ] [-0.557 , 0.089 ] [0.298 , -0.272 ] [-0.149 , 0.232 ] [-0.217 , 0.193 ] | Row 3: [-0.085 , -0.014 ] [0.225 , 0.089 ] [-0.338 , 0.072 ] [0.416 , -0.186 ] [-0.071 , 0.110 ] [0.467 , 0.497 ] | Row 4: [-0.336 , 0.471 ] [-0.144 , 0.070 ] [-0.062 , 0.520 ] [0.093 , 0.217 ] [-0.332 , -0.205 ] [0.012 , 0.335 ] +-------------------------------------------------------------------------------------------+ +-------------------------------------------------------------------------------------------+ | Content of dst "V" (3 dim) +-------------------------------------------------------------------------------------------+ | Content of dst "V" (3 dim) | Total Elements : [ Row:4 Col:64 Layer:2 ] +-------------------------------------------------------------------------------------------+ | Row 1: [0.302 , 0.010 ] [-0.238 , 0.680 ] [0.305 , 0.206 ] [-0.013 , 0.436 ] [-0.074 , -0.698 ] [-0.153 , -0.067 ] | Row 2: [0.091 , 0.199 ] [0.253 , 0.151 ] [-0.557 , 0.089 ] [0.298 , -0.272 ] [-0.149 , 0.232 ] [-0.217 , 0.193 ] | Row 3: [-0.085 , -0.014 ] [0.225 , 0.089 ] [-0.338 , 0.072 ] [0.416 , -0.186 ] [-0.071 , 0.110 ] [0.467 , 0.497 ] | Row 4: [-0.336 , 0.471 ] [-0.144 , 0.070 ] [-0.062 , 0.520 ] [0.093 , 0.217 ] [-0.332 , -0.205 ] [0.012 , 0.335 ] +-------------------------------------------------------------------------------------------+ +======================+======================+======================+======================+ ```

prusnak added the documentation Improvements or additions to documentation label Mar 13, 2023

MarkSchmidty mentioned this issue Mar 13, 2023

benchmarks? #34

Closed

MarkSchmidty mentioned this issue Mar 14, 2023

GPTQ Quantization (3-bit and 4-bit) #9

Closed

Ronsor mentioned this issue Mar 14, 2023

Don't use vdotq_s32 if it's not available #139

Merged

sw added bug Something isn't working build Compilation issues and removed documentation Improvements or additions to documentation labels Apr 16, 2023

KGOrphanides mentioned this issue Apr 17, 2023

The latest version is not built on an arm #949

Closed

ggerganov closed this as completed Jul 28, 2023

GoComputing mentioned this issue Aug 28, 2023

Performance issues on high performance computer #2844

Closed

4 tasks

slaren mentioned this issue Jun 12, 2024

move BLAS to a separate backend #6210

Merged

Raspberry Pi 4 4GB #58

Raspberry Pi 4 4GB #58

Comments

miolini commented Mar 12, 2023

miolini commented Mar 12, 2023

neuhaus commented Mar 12, 2023 • edited Loading

miolini commented Mar 12, 2023 • edited Loading

dalnk commented Mar 13, 2023 • edited Loading

miolini commented Mar 13, 2023

ekp1k80 commented Mar 13, 2023

MarkSchmidty commented Mar 13, 2023

miolini commented Mar 13, 2023

MarkSchmidty commented Mar 13, 2023 • edited Loading

miolini commented Mar 13, 2023

Ronsor commented Mar 14, 2023

miolini commented Mar 14, 2023

Ronsor commented Mar 14, 2023

miolini commented Mar 14, 2023

vkopitsa commented Mar 14, 2023 • edited Loading

Ronsor commented Mar 14, 2023 • edited Loading

Ronsor commented Mar 14, 2023

Ronsor commented Mar 14, 2023

davidrutland commented Mar 14, 2023

Ronsor commented Mar 14, 2023

Mestrace commented Mar 14, 2023

Mestrace commented Mar 14, 2023 • edited Loading

octoshrimpy commented Mar 15, 2023

MarkSchmidty commented Mar 15, 2023 • edited Loading

octoshrimpy commented Mar 15, 2023

gjmulder commented Mar 15, 2023

Mestrace commented Mar 15, 2023

octoshrimpy commented Mar 15, 2023

octoshrimpy commented Mar 15, 2023

miolini commented Mar 15, 2023

octoshrimpy commented Mar 15, 2023

stakeswky commented Mar 17, 2023

vitouXY commented Mar 19, 2023

MarkSchmidty commented Mar 19, 2023

msyyces8x95 commented Mar 20, 2023

miolini commented Mar 20, 2023

msyyces8x95 commented Mar 20, 2023

davidrimshnick commented Mar 21, 2023

HughMungis commented Mar 21, 2023 • edited Loading

msyyces8x95 commented Mar 21, 2023 • edited Loading

Schmadel commented Mar 25, 2023

JValtteri commented Apr 6, 2023

Rotvie commented Apr 20, 2023

khengari77 commented May 7, 2023

kiranns commented Jun 2, 2023

neuhaus commented Mar 12, 2023 •

edited

Loading

miolini commented Mar 12, 2023 •

edited

Loading

dalnk commented Mar 13, 2023 •

edited

Loading

MarkSchmidty commented Mar 13, 2023 •

edited

Loading

vkopitsa commented Mar 14, 2023 •

edited

Loading

Ronsor commented Mar 14, 2023 •

edited

Loading

Mestrace commented Mar 14, 2023 •

edited

Loading

MarkSchmidty commented Mar 15, 2023 •

edited

Loading

HughMungis commented Mar 21, 2023 •

edited

Loading

msyyces8x95 commented Mar 21, 2023 •

edited

Loading