Enabling server for Metal inference (apple silicon only) #1722

x4080 · 2023-06-06T21:06:45Z

x4080
Jun 6, 2023

Hi all,

with this Metal inference (apple silicon only) feature from #1642
how to using it with server ?

I already try it and it seems not plug & play for server, how to modify the server to enable this feature ?

Thanks

@FSSRepo Sorry to bother you again brother :)

Answered by FSSRepo

Hello, you need config the project with metal

cd llama.cpp/build

Config cmake to build with metal option:

cmake .. -DLLAMA_METAL=ON
cmake --build . --config Release

Run the server with gpu acceleration:

./server -m modelfile -ngl 1

change the ngl to offload a number of layers to the gpu

FSSRepo · 2023-06-07T00:53:45Z

Hello, you need config the project with metal

cd llama.cpp/build

Config cmake to build with metal option:

cmake .. -DLLAMA_METAL=ON
cmake --build . --config Release

Run the server with gpu acceleration:

./server -m modelfile -ngl 1

change the ngl to offload a number of layers to the gpu

0 replies

x4080 · 2023-06-07T01:14:45Z

@FSSRepo thanks, I never thought it will be that simple, I'll try it out

Edit : Wow it just works

0 replies