Skip to content

Eval bug: llama-server memory leak / infinite graph rebuild with LoRA between commits 7692 (works) and 7792 (broken) #19217

@ivanov84

Description

@ivanov84

Name and Version

Bug description

There is a severe memory leak / infinite graph rebuild in llama-server when using LoRA adapters.

The issue is 100% reproducible and was bisected to a very narrow commit range.


Working vs broken versions

  • Commit <= 7692 — works correctly
  • Commit >= 7792 — broken

Between these commits, llama-server starts to:

  • repeatedly rebuild execution graphs
  • continuously reserve memory
  • increase RAM usage without bound
  • eventually exhaust system memory

This happens even with:

  • CPU-only mode
  • single request
  • --parallel 1
  • small context size

Environment

  • OS: Windows 11
  • llama.cpp built from source
  • GPU: RTX 3060 (also reproduced with -ngl 0, CPU-only)
  • CUDA: 12.4 (but issue reproduces without CUDA)

Model / LoRA setup

  • Base model:
    Meta-Llama-3.1-8B-Instruct.Q8_0.gguf
  • LoRA:
    Converted to GGUF (convert_lora_to_gguf.py)
  • LoRA was trained on the same base model (non-quantized)

Command used to reproduce (CPU-only)

llama-server.exe ^
  -m Meta-Llama-3.1-8B-Instruct.Q8_0.gguf ^
  --lora _gestalt-adapter.gguf ^
  -ngl 0 ^
  -c 1024 ^
  --parallel 1 ^
  --host 0.0.0.0 ^
  --port 8080


### Operating systems

Windows

### GGML backends

CPU, CUDA

### Hardware

rtx 5060 3060

### Models

_No response_

### Problem description & steps to reproduce

Problem description & steps to reproduce
*
Please give us a summary of the problem and tell us how to reproduce it. If you can narrow down the bug to specific hardware, compile flags, or command line arguments, that information would be very much appreciated by us. If possible, please try to reproduce the issue using llama-completion with -fit off. If you can only reproduce the issue with -fit on, please provide logs both with and without --verbose.

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>
<!-- Copy-pasted short logs go into the "console" area here -->

```console

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions