Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago

I am wondering what has happened and if we can do something about it? Is this some kind of memory pool which has a bigger size? Can we reduce this size if we want to? I have noticed this issue with a model which was fitting into my GPU before, but it reports now out of memory when I offload all layers to GPU.
@Slaren , is it possible that this has something to do with the work you have done recently with managing GPU memory?

Will the selection of the LLAMA_CUDA_F16 option during compilation decrease inference GPU memory use?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago #6909

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago #6909

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions