Llava Initial approach to clear images #664

SignalRT · 2024-04-13T09:35:44Z

This changes the Interactive executor to be able to continue the conversation including new images. The example (outside the executor) clears the KV_CACHE to reset the previous images.

SignalRT · 2024-04-13T14:43:15Z

This is ready to review.

I modified the example to show how to manage the conversation changing the images when we want to talk about different images.
I update the documentation with the examples and the ILLamaExecutor interface change.

As an example:

This reverts commit f29f61e.

zsogitbe · 2024-04-14T05:34:25Z

SignalRT, 3 short remarks.

What is your aim with having several images per prompt? The context size will limit this because with adding new images the length of the answer will decrease (it is the difference between the context size and the prompt size). I am not sure if this is a feasible approach while using GPU and it will probably make everything slower with the CPU. I also think that this makes the example complicated.
Are you sure that llama.cpp handles several images per prompt well?
I have noticed that llava uses 20% more GPU memory, than before. I am not sure if this has to do with the C# code or the C++ code. I was running some tests with the code of IntPtrMax and that seems to use 20% less GPU memory. Can you think of any reason for the higher GPU memory usage?
I think that using ModelContext.GetState() and ModelContext.LoadState(state) is a better solution (both in llama and llava). I am not sure about the KV_CACHE and from the remark of ggerganov I think that he was also not sure about it.

martindevans · 2024-04-14T14:57:16Z

This feels a bit like a hack to workaround not having properly KV cache management in the executors tbh. But without a redesign of the way executors work with sequence IDs etc (which is what I'm working on, slowly, with BatchedExecutor) this is probably the best option for now.

AsakusaRinne · 2024-04-15T10:28:08Z

But without a redesign of the way executors work with sequence IDs etc (which is what I'm working on, slowly, with BatchedExecutor) this is probably the best option for now.

@martindevans Recently I'm also working on such things (experimental), inspired by vllm. There are a lot of works to do so I think I'll add it to preview branch or add a namespace experimental for it at first. Since you are also working on it, I'll push a draft ASAP so that we could discuss how to make it together. :) Besides I'd be more than happy to see your idea. Please feel free to ping me if there is anything I could help with.

SignalRT · 2024-04-15T19:23:27Z

@zsogitbe

SignalRT, 3 short remarks.

What is your aim with having several images per prompt? The context size will limit this because with adding new images the length of the answer will decrease (it is the difference between the context size and the prompt size). I am not sure if this is a feasible approach while using GPU and it will probably make everything slower with the CPU. I also think that this makes the example complicated.
Are you sure that llama.cpp handles several images per prompt well?

I have noticed that llava uses 20% more GPU memory, than before. I am not sure if this has to do with the C# code or the C++ code. I was running some tests with the code of IntPtrMax and that seems to use 20% less GPU memory. Can you think of any reason for the higher GPU memory usage?

I think that using ModelContext.GetState() and ModelContext.LoadState(state) is a better solution (both in llama and llava). I am not sure about the KV_CACHE and from the remark of ggerganov I think that he was also not sure about it.

The goal is to support the capabilities of the model: Multiple images for eval.run_llava haotian-liu/LLaVA#432 . You can use one image if you want. I don´t see the problem to maintain the interface to be able to pass more than one image. Other Multi Dimensional models support also several images.
I cannot see with my configuration different GPU memory usage between this PR and the previous version. I cannot compare with other architectures. Do you test this with exactly the same llama/llava dll versions in all scenarios?.
I give a try to your proposal without success. The model begans to mix images in the answer. This is the answer that provides with the third image of my previous demo mixing two different scenes: This is a color photograph depicting a young woman standing on a wooden bridge overlooking a canal with a cup of coffee. The woman appears to be looking off into the distance, away from her red shirt. In the background, there are trees and a building in the background.

SignalRT · 2024-04-15T19:44:34Z

@martindevans, @AsakusaRinne
I'm also giving a try to a new batch executor (currently outside LlamaSharp to simplify my test). I'm focusing on server concurrent usage. I'm getting some metrics on the performance of this method, I will share with you when I have the comparison done.

I will integrate this PR "as is". We will probably need to change llava support in the near future (ggerganov/llama.cpp#4216 (comment)).

zsogitbe · 2024-04-16T05:00:27Z

Thank you for your answer SignalRT,
In general the llava code is very well done! Thank you for your effort.

After reading the reference you have provided above and taking into account that the models are trained with one image only per input record, I think that it does not have any sense to use multiple images as input for one prompt. I also do not see immediately a good use case for such a functionality. I think that the model will be confused and in the best case it will describe multiple images as it would be one image, for example as a collage or it will just describe one image.
I also think that the example would be easier to understand if there would only be one image as input. It is a bit confusing now what is happening in the example (I had to think very hard to understand your example and this should not be the aim - the example should be easy to understand and simple).
But, it is up to you to decide.

I have noticed the higher GPU usage because I have an example which does not run anymore on the GPU (out of memory), but was fitting into GPU memory before. I do not know what has happened and where, I just see the result of it. Maybe it has to do with GPU memory allocation type (automatic thread pool with overhead). I think that the GPU memory allocation type has changed in the C# code or in the C++ code. I hope that we will have some kind of option to optimize GPU memory usage because it is very important for the speed (everything is much faster with GPU).

I have tried saving/loading the state also and it did not work. After loading the context state the model produces garbage in my case. Probably a bug in state management somewhere. The kv cache clearing works.

@martindevans, @AsakusaRinne, @SignalRT, does state saving/loading work with the general llama code base? This is important because I was counting on this to work (using it to reset the context without recreating it). Or do we need to use kv cache clearing there also?

martindevans · 2024-04-16T13:07:31Z

I'm not aware of any issues with state load/save at the moment. It's something I'm going to be actively working on soon (load/save conversations in the BatchedExecutor), so if there are any issues they'll be fixed soon :)

SignalRT added 2 commits April 13, 2024 11:33

Initial approach to clear images

d6890e4

Link the llama.cpp reference about reset llava contex

aa11562

SignalRT mentioned this pull request Apr 13, 2024

Extension LLava with in memory images #653

Merged

Clean up and align documentation with the changes in the interface

168f697

SignalRT requested review from martindevans and AsakusaRinne April 13, 2024 14:43

SignalRT mentioned this pull request Apr 13, 2024

[llava] How to clear imagepaths ? #643

Open

SignalRT added 2 commits April 13, 2024 17:38

Try MacOS ARM availability

f29f61e

Revert "Try MacOS ARM availability"

0cf6073

This reverts commit f29f61e.

martindevans approved these changes Apr 14, 2024

View reviewed changes

AsakusaRinne approved these changes Apr 15, 2024

View reviewed changes

SignalRT merged commit 399e81d into SciSharp:master Apr 16, 2024
3 checks passed

SignalRT deleted the LLavaResetOnImageChange branch April 20, 2024 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llava Initial approach to clear images #664

Llava Initial approach to clear images #664

SignalRT commented Apr 13, 2024

SignalRT commented Apr 13, 2024

zsogitbe commented Apr 14, 2024 •

edited

Loading

martindevans commented Apr 14, 2024

AsakusaRinne commented Apr 15, 2024

SignalRT commented Apr 15, 2024

SignalRT commented Apr 15, 2024

zsogitbe commented Apr 16, 2024 •

edited

Loading

martindevans commented Apr 16, 2024

Llava Initial approach to clear images #664

Llava Initial approach to clear images #664

Conversation

SignalRT commented Apr 13, 2024

SignalRT commented Apr 13, 2024

zsogitbe commented Apr 14, 2024 • edited Loading

martindevans commented Apr 14, 2024

AsakusaRinne commented Apr 15, 2024

SignalRT commented Apr 15, 2024

SignalRT commented Apr 15, 2024

zsogitbe commented Apr 16, 2024 • edited Loading

martindevans commented Apr 16, 2024

zsogitbe commented Apr 14, 2024 •

edited

Loading

zsogitbe commented Apr 16, 2024 •

edited

Loading