Skip to content

Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago #6909

Closed
@zsogitbe

Description

@zsogitbe

I am wondering what has happened and if we can do something about it? Is this some kind of memory pool which has a bigger size? Can we reduce this size if we want to? I have noticed this issue with a model which was fitting into my GPU before, but it reports now out of memory when I offload all layers to GPU.
@slaren , is it possible that this has something to do with the work you have done recently with managing GPU memory?

Will the selection of the LLAMA_CUDA_F16 option during compilation decrease inference GPU memory use?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions