Librarification #10

philpax · 2023-03-15T00:16:28Z

This is a very rough split of the CLI application into a library and application. It includes #8 because that was on my branch when I did this - we can figure out what to do with it when it's time.

The main differences are

conversion of printlns to logs
removal of prints
adding a callback function for arrival of new tokens

Turns out this is basically entirely sufficient for first-pass use as a library. Here's proof.

mwbryant

Everything LGTM with some very minor nits. I've tested it and with a fixed seed it gives the same results and performance as the original. Needing to set the flags (which also required Rust 1.68) for the build script is still a pain but I'm sure there's a solution to that.

Seems to be resilient to all the crazy things I tried, inference can be called multiple times without issue.

In the future it would be nice to have inference with a continuous state between calls (i.e. an interactive mode)

llama-rs/src/llama.rs

ggml-raw/build.rs

llama-rs/src/llama.rs

setzer22

This was quick! 😄 Thanks a lot for the changes.

Not much to add on top of @mwbryant's review, but I left a couple additional comments.

llama-rs/src/llama.rs

ggml-raw/build.rs

setzer22 · 2023-03-15T09:52:20Z

Amazing work on llamacord btw! 😄 I'm gonna try this out

llama-cli/src/main.rs

philpax · 2023-03-15T20:52:00Z

Thanks for the feedback, everyone! Will action within the next few hours so we can get this merged 👍

philpax · 2023-03-16T01:35:03Z

The llama-rs crate now has no logging of its own; all progress is reported through the callback. Pretty happy with how that turned out!

I've also switched over to thiserror. I've tried to remain consistent with the existing names, but I'd like to agree on #17 before we have any major users.

As an aside: should the CLI be llama-cli or llama-rs-cli?

setzer22 · 2023-03-16T08:05:12Z

As an aside: should the CLI be llama-cli or llama-rs-cli

I was originally thinking llama-rs-cli. But after seeing the name, I really like llama-cli, especially if it ends up in someone's path via cargo install, it's easier to type. So if you prefer that one I'm on board as well!

setzer22

Amazing work, thanks a lot again for this!

Left a few minor comments, should be quick enough to address 👍

There's also a merge conflict since we merged #10. I'd suggest just solving it in merge, and not rebase (it's cleaner IMO, and some of us already pulled that branch).

llama-cli/src/main.rs

llama-rs/src/lib.rs

setzer22 · 2023-03-16T10:44:56Z

llama-rs/src/llama.rs

+/// Each variant represents a step within the process of loading the model.
+/// These can be used to report progress to the user.
+#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
+pub enum LoadProgress<'a> {


A very clean solution! 😄 Having this will be especially important if we end up implementing parallel loading as described on #15. Because relying on the order of print messages for the little dots breaks the moment you start multithreading.

philpax added 4 commits March 15, 2023 00:40

feat(raw): make build script more resilient

404dc1c

refactor: rough decouple lib and cli

90589fa

refactor(llama): output tokens through callback

175674e

feat: export OutputToken

c437f9f

philpax mentioned this pull request Mar 15, 2023

Let's collaborate #4

Closed

mwbryant approved these changes Mar 15, 2023

View reviewed changes

llama-rs/src/llama.rs Show resolved Hide resolved

ggml-raw/build.rs Show resolved Hide resolved

llama-rs/src/llama.rs Outdated Show resolved Hide resolved

llama-rs/src/llama.rs Outdated Show resolved Hide resolved

setzer22 reviewed Mar 15, 2023

View reviewed changes

llama-rs/src/llama.rs Show resolved Hide resolved

ggml-raw/build.rs Outdated Show resolved Hide resolved

mwbryant mentioned this pull request Mar 15, 2023

Warning: Bad token in vocab at index xxx #11

Closed

setzer22 reviewed Mar 15, 2023

View reviewed changes

llama-cli/src/main.rs Outdated Show resolved Hide resolved

setzer22 mentioned this pull request Mar 15, 2023

Implementation of prompt caching #14

Merged

This was referenced Mar 16, 2023

feat(raw): make build script more resilient #8

Merged

Renaming of types #17

Closed

philpax added 3 commits March 16, 2023 01:23

chore(llama): remove unnecessary lifetimes

a4b3798

feat(llama): provide load progress callback

9642ef9

feat(llama): switch to thiserror

5f0992b

Merge branch 'main' of https://github.com/setzer22/llama-rs into library

28f901d

setzer22 reviewed Mar 16, 2023

View reviewed changes

philpax added 4 commits March 16, 2023 11:49

total_tensors -> tensor_count

e2efd97

feat(cli): default log level

9809f96

refactor(llama): rustify type names - fix #17

e765617

Merge branch 'main' of https://github.com/setzer22/llama-rs into library

539aabe

setzer22 merged commit 5a8c435 into rustformers:main Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Librarification #10

Librarification #10

philpax commented Mar 15, 2023

mwbryant left a comment

setzer22 left a comment

setzer22 commented Mar 15, 2023

philpax commented Mar 15, 2023

philpax commented Mar 16, 2023 •

edited

Loading

setzer22 commented Mar 16, 2023 •

edited

Loading

setzer22 left a comment •

edited

Loading

setzer22 Mar 16, 2023

Librarification #10

Librarification #10

Conversation

philpax commented Mar 15, 2023

mwbryant left a comment

Choose a reason for hiding this comment

setzer22 left a comment

Choose a reason for hiding this comment

setzer22 commented Mar 15, 2023

philpax commented Mar 15, 2023

philpax commented Mar 16, 2023 • edited Loading

setzer22 commented Mar 16, 2023 • edited Loading

setzer22 left a comment • edited Loading

Choose a reason for hiding this comment

setzer22 Mar 16, 2023

Choose a reason for hiding this comment

philpax commented Mar 16, 2023 •

edited

Loading

setzer22 commented Mar 16, 2023 •

edited

Loading

setzer22 left a comment •

edited

Loading