Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model File Manager #789

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

patrick-hovsepian
Copy link
Contributor

Create a simple manager to make administration of local models simpler

  • Configure a list of directories that will be scanned to find gguf files that are ready to be loaded
  • convenience method for loading a model that includes default params
  • tests

* seed manageR

* model manager init

* interface

* test

* tests

---------

Co-authored-by: Pat Hov <hov@hov.com>
Co-authored-by: pat_hov <hov@hovbook>
Copy link
Member

@martindevans martindevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments, the main issue is about being completely clear about ownership and resource disposal (models are expensive to load and keep loaded, so it's important that resource management is completely clear!)

patrick-hovsepian and others added 4 commits June 11, 2024 19:25
* seed manageR

* model manager init

* interface

* test

* tests

* no default configurator

* Rename class

* handle already disposed

---------

Co-authored-by: Pat Hov <hov@hov.com>
Co-authored-by: pat_hov <hov@hovbook>
Co-authored-by: pat_hov <hov@hovbook>
* organization

* disposable and ref counter

---------

Co-authored-by: pat_hov <hov@hovbook>
Copy link
Collaborator

@AsakusaRinne AsakusaRinne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I left some comments and please let me know if any help is needed.

LLama/Model/ModelCache.cs Outdated Show resolved Hide resolved
LLama/Model/ModelCache.cs Outdated Show resolved Hide resolved
LLama/Model/IModelCache.cs Outdated Show resolved Hide resolved
LLama/LLamaWeights.cs Outdated Show resolved Hide resolved
LLama/LLamaWeights.cs Outdated Show resolved Hide resolved
LLama/LLamaWeights.cs Outdated Show resolved Hide resolved
@AsakusaRinne AsakusaRinne added enhancement New feature or request benchmark Trigger benchmark workflow labels Jun 13, 2024
* organization

* disposable and ref counter

* separate concerns a bit more

* check

* tweak

---------

Co-authored-by: pat_hov <hov@hovbook>
LLama/LLamaWeights.cs Outdated Show resolved Hide resolved
* organization

* disposable and ref counter

* separate concerns a bit more

* check

* tweak

* stash

* note

---------

Co-authored-by: pat_hov <hov@hovbook>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essentially for demo purposes. wanted to see how abstract the interface is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just a test do we want to keep it in this PR? I think @AsakusaRinne was working on HF integrations for model loading, so you might want to check what the status of that work is and add something in a separate PR?

Copy link
Contributor Author

@patrick-hovsepian patrick-hovsepian Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what the thoughts are on that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, I agree that it could be removed from this PR and added in a separate PR to LLama.Experimental project. With HuggingfaceHub, it's easy to download a model from the huggingface. It's easy to implement a remote model manager for gguf files but the APIs might change in the future if we want to support other formats (.safetensors, .bin) based on GGMLSharp. So I would recommend putting it in the LLama.Experimental first.

Comment on lines +34 to +44
/// <summary>
/// Unload and dispose of a model with the given id
/// </summary>
/// <param name="modelId"></param>
/// <returns></returns>
public bool UnloadModel(string modelId);

/// <summary>
/// Unload all currently loaded models
/// </summary>
public void UnloadAllModels();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few challenges with supporting this and I'm thinking maybe it's better to just drop these methods and have the caller explicitly call Dispose on the weight?

A few questions

  1. Should multiple instances of the same model be allowed to be loaded? I think yes. If so, we'll need a way to ensure calling unload is specific to a model.
  2. Should we force model aliases in the cache to be unique? This would help the issue where I load multiple instances of the model and then calling dispose is guaranteed to operate on that same model. If we don't want to enforce this restriction, unloading the correct model because tricker and might require the original model to be passed in as opposed to the alias.
  3. Are we better off getting rid of this class altogether? My main goal was to have something like IModelRepo and quickly went out of scope with some of this but if we think it'll be useful I'm happy to leave it but I'm struggling a bit to justify it especially given the unload/dispose challenges.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should multiple instances of the same model be allowed to be loaded? I think yes. If so, we'll need a way to ensure calling unload is specific to a model.

Definitely. You can load a model with different settings (e.g. loras) which affect the output almost as if it's another model.

Should we force model aliases in the cache to be unique?

Do you mean the modelId string? If so then I would think loading a model with an ID which is in the cache would either:

  • throw an error
  • return the existing model

I'd lean towards throwign the error out of those choices.

Are we better off getting rid of this class altogether?

The model repo idea is interesting, and I think it's something Rinne has looked at as well. But yeah maybe it would be better to remove it from this PR, and make this PR all about the new shared lifetimes (which I think is a pretty big improvement in itself). The model repo can be done in a followup PR, fully taking advantage of the new lifetimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything, I'd rather keep the model repo and the file system class than this but I've further reworked this to be more explicit. I haven't been able to finalize what a "good" api is for something like this (name aside) but the discussion here has helped add context.

LLama/LLamaWeights.cs Outdated Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have any thoughts on this @AsakusaRinne

Copy link
Collaborator

@AsakusaRinne AsakusaRinne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall of the code looks good. Since this PR added many new APIs, could you please add a document to introduce the usage of model cache and model repo? Please refer to #747 to see how to add a doc. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Trigger benchmark workflow enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants