Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Librarification #10

Merged
merged 12 commits into from
Mar 16, 2023
Merged

Librarification #10

merged 12 commits into from
Mar 16, 2023

Conversation

philpax
Copy link
Collaborator

@philpax philpax commented Mar 15, 2023

This is a very rough split of the CLI application into a library and application. It includes #8 because that was on my branch when I did this - we can figure out what to do with it when it's time.

The main differences are

  • conversion of printlns to logs
  • removal of prints
  • adding a callback function for arrival of new tokens

Turns out this is basically entirely sufficient for first-pass use as a library. Here's proof.

@philpax philpax mentioned this pull request Mar 15, 2023
Copy link
Contributor

@mwbryant mwbryant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything LGTM with some very minor nits. I've tested it and with a fixed seed it gives the same results and performance as the original. Needing to set the flags (which also required Rust 1.68) for the build script is still a pain but I'm sure there's a solution to that.

Seems to be resilient to all the crazy things I tried, inference can be called multiple times without issue.

In the future it would be nice to have inference with a continuous state between calls (i.e. an interactive mode)

llama-rs/src/llama.rs Show resolved Hide resolved
ggml-raw/build.rs Show resolved Hide resolved
llama-rs/src/llama.rs Outdated Show resolved Hide resolved
llama-rs/src/llama.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@setzer22 setzer22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was quick! 😄 Thanks a lot for the changes.

Not much to add on top of @mwbryant's review, but I left a couple additional comments.

llama-rs/src/llama.rs Show resolved Hide resolved
ggml-raw/build.rs Outdated Show resolved Hide resolved
@setzer22
Copy link
Collaborator

Amazing work on llamacord btw! 😄 I'm gonna try this out

llama-cli/src/main.rs Outdated Show resolved Hide resolved
@philpax
Copy link
Collaborator Author

philpax commented Mar 15, 2023

Thanks for the feedback, everyone! Will action within the next few hours so we can get this merged 👍

@philpax
Copy link
Collaborator Author

philpax commented Mar 16, 2023

The llama-rs crate now has no logging of its own; all progress is reported through the callback. Pretty happy with how that turned out!

I've also switched over to thiserror. I've tried to remain consistent with the existing names, but I'd like to agree on #17 before we have any major users.

As an aside: should the CLI be llama-cli or llama-rs-cli?

@setzer22
Copy link
Collaborator

setzer22 commented Mar 16, 2023

As an aside: should the CLI be llama-cli or llama-rs-cli

I was originally thinking llama-rs-cli. But after seeing the name, I really like llama-cli, especially if it ends up in someone's path via cargo install, it's easier to type. So if you prefer that one I'm on board as well!

Copy link
Collaborator

@setzer22 setzer22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work, thanks a lot again for this!

Left a few minor comments, should be quick enough to address 👍

There's also a merge conflict since we merged #10. I'd suggest just solving it in merge, and not rebase (it's cleaner IMO, and some of us already pulled that branch).

llama-cli/src/main.rs Outdated Show resolved Hide resolved
llama-rs/src/lib.rs Outdated Show resolved Hide resolved
/// Each variant represents a step within the process of loading the model.
/// These can be used to report progress to the user.
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
pub enum LoadProgress<'a> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very clean solution! 😄 Having this will be especially important if we end up implementing parallel loading as described on #15. Because relying on the order of print messages for the little dots breaks the moment you start multithreading.

@setzer22 setzer22 merged commit 5a8c435 into rustformers:main Mar 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants