Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension LLava with in memory images #653

Merged
merged 4 commits into from
Apr 12, 2024
Merged

Conversation

zsogitbe
Copy link
Contributor

@zsogitbe zsogitbe commented Apr 6, 2024

No description provided.

@SignalRT
Copy link
Collaborator

SignalRT commented Apr 7, 2024

@zsogitbe, thank you very much for your work. I would like to share my thoughts:

  • The interface is defined in Llama.Abstraction.ILLamaExecutor. I think that if you change the interface should be done in ILLamaExecutor.
  • I would prefer not to have a duplicate property with the image in another format. At this point I think that would be better to break the interface to do it only once. My preference would be something like:
    class ImageData
    {
        enum dataType { imagePath, imageBytes, imageURL }
        public dataType DataType { get; set; }
        public object Data { get; set; }
    }

And to change the property ImagePaths for something like:

public List<ImageData > ImageData { get; set; }

Anyway that's only my opinion.

@zsogitbe
Copy link
Contributor Author

zsogitbe commented Apr 7, 2024

SignalRT, it is a good idea. I did not do it because it means much more code (I have updated the PR, you can see the changes needed). I usually prefer simplicity, but in this case the standardization worth it.

@SignalRT
Copy link
Collaborator

SignalRT commented Apr 8, 2024

Thank you @zsogitbe, I will review the changes as soon as possible.

throw new NotImplementedException();
using var httpClient = new HttpClient();
var uri = new Uri((string)image.Data);
var imageBytes = httpClient.GetByteArrayAsync(uri).Result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use Result here (or as a rule, anywhere). use await instead.

@@ -154,15 +155,21 @@ private Task PreprocessLlava(string text, InferStateArgs args, bool addBos = tru
{
if (image.Type == ImageData.DataType.ImagePath && image.Data != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can all of this logic be moved out of the executor? Maybe a base interface with separate classes for the different variations.

For example:

e.g.

interface IClipImage { Task<SafeLlavaImageEmbedHandle> GetEmbed(); }

class ImageDataFromUrl(string url)
{
    public async Task<SafeLlavaImageEmbedHandle> GetEmbed()
    {
        return SafeLlavaImageEmbedHandle.CreateFromMemory(await GetImageBytes());
    }

    private async Task<byte[]> GetImageBytes()
    {
        return await DownloadThatUrl(url);
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same after adding more and more to the code. The reading/downloading of the image should be the task of the user of the library.

@zsogitbe
Copy link
Contributor Author

This is ready for merging.

@martindevans
Copy link
Member

Looks good to me. I'll leave the final review to @SignalRT since he knows more about llava than me.

@SignalRT
Copy link
Collaborator

I would have clearly preferred to keep the option to allow paths to files and images in memory, but it is blocking another PR with some change in key management, so I think that can be merge as is.

@SignalRT SignalRT merged commit 8dd9101 into SciSharp:master Apr 12, 2024
3 checks passed
@zsogitbe
Copy link
Contributor Author

SignalRT, , Please think about this once more. I think that the most optimal way to support images in the library is to have in memory byte array as the core. The users can easily convert any image from anywhere (HD, internet, DB, ...) to this byte array. The in memory image (byte array) works with all possible image locations! This is the reason for having this is the best standardized way of working.

@SignalRT
Copy link
Collaborator

@zsogitbe, The llava API support both approach and I think there is a reason to support this. For now I didn´t change the approach. Just include the capability to include new images in the conversation and a reset strategy in the example.

#664

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants