Extension LLava with in memory images #653

zsogitbe · 2024-04-06T15:23:02Z

No description provided.

SignalRT · 2024-04-07T16:05:19Z

@zsogitbe, thank you very much for your work. I would like to share my thoughts:

The interface is defined in Llama.Abstraction.ILLamaExecutor. I think that if you change the interface should be done in ILLamaExecutor.
I would prefer not to have a duplicate property with the image in another format. At this point I think that would be better to break the interface to do it only once. My preference would be something like:

    class ImageData
    {
        enum dataType { imagePath, imageBytes, imageURL }
        public dataType DataType { get; set; }
        public object Data { get; set; }
    }

And to change the property ImagePaths for something like:

public List<ImageData > ImageData { get; set; }

Anyway that's only my opinion.

zsogitbe · 2024-04-07T17:50:00Z

SignalRT, it is a good idea. I did not do it because it means much more code (I have updated the PR, you can see the changes needed). I usually prefer simplicity, but in this case the standardization worth it.

SignalRT · 2024-04-08T04:29:23Z

Thank you @zsogitbe, I will review the changes as soon as possible.

martindevans · 2024-04-08T13:04:18Z

LLama/LLamaInteractExecutor.cs

-                        throw new NotImplementedException();
+                        using var httpClient = new HttpClient();
+                        var uri = new Uri((string)image.Data);
+                        var imageBytes = httpClient.GetByteArrayAsync(uri).Result;


Don't use Result here (or as a rule, anywhere). use await instead.

martindevans · 2024-04-08T13:08:28Z

LLama/LLamaInteractExecutor.cs

@@ -154,15 +155,21 @@ private Task PreprocessLlava(string text, InferStateArgs args, bool addBos = tru
                {
                    if (image.Type == ImageData.DataType.ImagePath && image.Data != null)


Can all of this logic be moved out of the executor? Maybe a base interface with separate classes for the different variations.

For example:

e.g.

interface IClipImage { Task<SafeLlavaImageEmbedHandle> GetEmbed(); } class ImageDataFromUrl(string url) { public async Task<SafeLlavaImageEmbedHandle> GetEmbed() { return SafeLlavaImageEmbedHandle.CreateFromMemory(await GetImageBytes()); } private async Task<byte[]> GetImageBytes() { return await DownloadThatUrl(url); } }

I was thinking the same after adding more and more to the code. The reading/downloading of the image should be the task of the user of the library.

zsogitbe · 2024-04-12T11:24:59Z

This is ready for merging.

martindevans · 2024-04-12T13:59:53Z

Looks good to me. I'll leave the final review to @SignalRT since he knows more about llava than me.

SignalRT · 2024-04-12T20:40:34Z

I would have clearly preferred to keep the option to allow paths to files and images in memory, but it is blocking another PR with some change in key management, so I think that can be merge as is.

zsogitbe · 2024-04-13T08:01:20Z

SignalRT, , Please think about this once more. I think that the most optimal way to support images in the library is to have in memory byte array as the core. The users can easily convert any image from anywhere (HD, internet, DB, ...) to this byte array. The in memory image (byte array) works with all possible image locations! This is the reason for having this is the best standardized way of working.

SignalRT · 2024-04-13T10:01:28Z

@zsogitbe, The llava API support both approach and I think there is a reason to support this. For now I didn´t change the approach. Just include the capability to include new images in the conversation and a reset strategy in the example.

#664

Extension LLava with in memory images

d3c5a42

Standardizing Image Data implementation

e991e63

Download image implementation

44a82b0

martindevans requested changes Apr 8, 2024

View reviewed changes

Simplifying image handling

f4fad82

martindevans approved these changes Apr 12, 2024

View reviewed changes

SignalRT merged commit 8dd9101 into SciSharp:master Apr 12, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extension LLava with in memory images #653

Extension LLava with in memory images #653

zsogitbe commented Apr 6, 2024

SignalRT commented Apr 7, 2024

zsogitbe commented Apr 7, 2024

SignalRT commented Apr 8, 2024

martindevans Apr 8, 2024

martindevans Apr 8, 2024

zsogitbe Apr 8, 2024

zsogitbe commented Apr 12, 2024

martindevans commented Apr 12, 2024

SignalRT commented Apr 12, 2024

zsogitbe commented Apr 13, 2024

SignalRT commented Apr 13, 2024

		@@ -154,15 +155,21 @@ private Task PreprocessLlava(string text, InferStateArgs args, bool addBos = tru
		{
		if (image.Type == ImageData.DataType.ImagePath && image.Data != null)

Extension LLava with in memory images #653

Extension LLava with in memory images #653

Conversation

zsogitbe commented Apr 6, 2024

SignalRT commented Apr 7, 2024

zsogitbe commented Apr 7, 2024

SignalRT commented Apr 8, 2024

martindevans Apr 8, 2024

Choose a reason for hiding this comment

martindevans Apr 8, 2024

Choose a reason for hiding this comment

zsogitbe Apr 8, 2024

Choose a reason for hiding this comment

zsogitbe commented Apr 12, 2024

martindevans commented Apr 12, 2024

SignalRT commented Apr 12, 2024

zsogitbe commented Apr 13, 2024

SignalRT commented Apr 13, 2024