-
Notifications
You must be signed in to change notification settings - Fork 362
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything LGTM with some very minor nits. I've tested it and with a fixed seed it gives the same results and performance as the original. Needing to set the flags (which also required Rust 1.68) for the build script is still a pain but I'm sure there's a solution to that.
Seems to be resilient to all the crazy things I tried, inference can be called multiple times without issue.
In the future it would be nice to have inference with a continuous state between calls (i.e. an interactive mode)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was quick! 😄 Thanks a lot for the changes.
Not much to add on top of @mwbryant's review, but I left a couple additional comments.
Amazing work on llamacord btw! 😄 I'm gonna try this out |
Thanks for the feedback, everyone! Will action within the next few hours so we can get this merged 👍 |
The I've also switched over to As an aside: should the CLI be |
I was originally thinking llama-rs-cli. But after seeing the name, I really like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work, thanks a lot again for this!
Left a few minor comments, should be quick enough to address 👍
There's also a merge conflict since we merged #10. I'd suggest just solving it in merge, and not rebase (it's cleaner IMO, and some of us already pulled that branch).
/// Each variant represents a step within the process of loading the model. | ||
/// These can be used to report progress to the user. | ||
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)] | ||
pub enum LoadProgress<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very clean solution! 😄 Having this will be especially important if we end up implementing parallel loading as described on #15. Because relying on the order of print messages for the little dots breaks the moment you start multithreading.
This is a very rough split of the CLI application into a library and application. It includes #8 because that was on my branch when I did this - we can figure out what to do with it when it's time.
The main differences are
println
s tolog
sprint
sTurns out this is basically entirely sufficient for first-pass use as a library. Here's proof.