Automatic optimization of runtime parameters such as -ngl given memory constraints

I'm interested in implementing code for automatically determining the optimal runtime parameters given some model and memory constraints. I imagine the implementation to use something like a "dummy" parameter which, when set, does not result in any actual memory allocations but enables the creation of `llama_model` and `llama_context` dummies that can be used to determine how much memory would be used for some choice of `llama_model_params` and `llama_context_params`. By comparing the amount of memory that was used for the dummies with the amount of memory that is actually available the implementation could then iteratively optimize parameters such as context size or the number of GPU layers.

One roadblock that I have run into is how to make this implementation minimally invasive for the rest of the code. Right now I think the way to do it would be:

* Extend `ggml_backend_device` to track the amount of memory that has been allocated to this device by the current process.
* Add a function like `ggml_backend_dev_get_device_dummy` that returns a dummy instead of the actual device.
* In llama.cpp, conditionally fetch the dummy devices. Some additional logic in `llama-model-load.cpp` will still be needed to avoid temporarily loading data from disk to RAM.
* Extend the logic of `llama_decode` a bit to allow for determining the allocated size of the worst-case graph.
* In the runtime parameter optimization code, simply iterate over the dummy devices and retrieve the amount of memory that was allocated.

I'm very much open to suggestions, particularly from @slaren .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automatic optimization of runtime parameters such as -ngl given memory constraints #13860

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Automatic optimization of runtime parameters such as -ngl given memory constraints #13860

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions