Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory management bug in llava and server code #5491

Merged
merged 2 commits into from
Feb 15, 2024

Conversation

Elbios
Copy link
Contributor

@Elbios Elbios commented Feb 14, 2024

Fixes this error:

llama_new_context_with_model: graph splits (measure): 3 Available slots:
-> Slot 0 - max context: 6000
{"timestamp":1707926446,"level":"INFO","function":"main","line":2623,"message":"model loaded"} all slots are idle and system prompt is empty, clear the KV cache slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
slot 0 - encoding image [id: 1]
munmap_chunk(): invalid pointer
Aborted

when running the server binary like this:

./bin/server -m ../models/mistral-7b-q_5_k.gguf --mmproj ../models/mmproj-mistral7b-f16-q6_k.gguf -ngl 50 -c 6000 --host 0.0.0.0 --port 8007 --no-mmap

Tested on:
Linux, WSL (Debian)
GPU: 4090

Fixes this error:

llama_new_context_with_model: graph splits (measure): 3
Available slots:
 -> Slot 0 - max context: 6000
{"timestamp":1707926446,"level":"INFO","function":"main","line":2623,"message":"model loaded"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
slot 0 - encoding image [id: 1]
munmap_chunk(): invalid pointer
Aborted
@Elbios Elbios mentioned this pull request Feb 14, 2024
Comment on lines 63 to 64
CLIP_API void clip_image_u8_batch_free (struct clip_image_u8 * data);
CLIP_API void clip_image_f32_batch_free(struct clip_image_f32 * data);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to change these to:

CLIP_API void clip_image_u8_batch_free (struct clip_image_u8_batch  * batch) [
    if (batch.size > 0) {
        delete[] batch.data;
    }
    batch.size = 0;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I changed it and retested.

@@ -1494,11 +1506,8 @@ bool clip_image_preprocess(struct clip_ctx * ctx, const clip_image_u8 * img, cli
pad_to_square = false;
}
// free the previous res_imgs if any set
if (res_imgs.size > 0 && res_imgs.size < 100) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I removed the upper bound because there didn't seem to be any justification for it, but if there is then let me know @cmp-nct and I'll restore it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I removed the upper bound because there didn't seem to be any justification for it, but if there is then let me know @cmp-nct and I'll restore it

The reason for the upper bound was a safety check in case the passed structure points to uninitialized memory, in that case it would almost certainly be outside that range.
So only relevant if someone uses it wrong, I'm fine either way.

Glad you spotted the double free, it's another remnant of the vector->pointer refactor.

jpohhhh added a commit to Telosnex/fllama that referenced this pull request Feb 15, 2024
@ggerganov ggerganov merged commit 0d41771 into ggerganov:master Feb 15, 2024
45 of 53 checks passed
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* Fix memory management in llava and server code

Fixes this error:

llama_new_context_with_model: graph splits (measure): 3
Available slots:
 -> Slot 0 - max context: 6000
{"timestamp":1707926446,"level":"INFO","function":"main","line":2623,"message":"model loaded"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
slot 0 - encoding image [id: 1]
munmap_chunk(): invalid pointer
Aborted

* Make it cleaner by checking size in batch free wrapper
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* Fix memory management in llava and server code

Fixes this error:

llama_new_context_with_model: graph splits (measure): 3
Available slots:
 -> Slot 0 - max context: 6000
{"timestamp":1707926446,"level":"INFO","function":"main","line":2623,"message":"model loaded"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
slot 0 - encoding image [id: 1]
munmap_chunk(): invalid pointer
Aborted

* Make it cleaner by checking size in batch free wrapper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants