Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impelement sync / send for llama structs #481

Open
thewh1teagle opened this issue Aug 30, 2024 · 3 comments
Open

Impelement sync / send for llama structs #481

thewh1teagle opened this issue Aug 30, 2024 · 3 comments
Labels

Comments

@thewh1teagle
Copy link
Contributor

So I can use it across threads with Mutex.

@MarcusDunn
Copy link
Contributor

I don't know if it is safe to implement either of those.

A PR would be welcome if there's both a problem that cannot be solved without implementing them and an argument for why Send and Sync are safe to implement (I'm not nearly a good enough rust or c++ developer to be confident)

I've had success with static's + message passing while writing a web server based on this library, it's not super elegant, but it certainly gets the job done. I'd be interested to know the usecase you're running into.

@samuelint
Copy link

I have a concrete use case :

async_stream

As I understand, any async code must be Send since the execution can moved in another thread when the code is interrupted and then resumed.

{
  ...
  some().await; // Thread 1
  ...
  other().await; // Thread 2
  ...
  another().await; // Thread 1
}

Using LlamaContext with async leads to LlamaContext<'_> which is not Send Error. Even if creating the ctx inside the async_stream (since the execution can move in any thread when interrupted).

let stream = async_stream::stream! {
    let model = model_factory.create(&model_path).unwrap();
        let mut ctx = match model.new_context(&self.backend, LlamaContextParams::default()) {
        //      ------- has type `LlamaContext<'_>` which is not `Send`
            Ok(ctx) => ctx,
            Err(err) => {
                yield Err(err.into());
                return;
            }
        };

        // ...

        while n_cur <= prompt_and_response_length {
            //...

            let mut output_string = String::with_capacity(32);
            let _decode_result =
                decoder.decode_to_string(&output_bytes, &mut output_string, false);

            yield Ok(output_string);

            // ...
        }

}

Box::pin(stream)

To have the LlamaContext Send, the reference to the LlamaModel should not be a lifetime but a Arc so it's thread safe.
Which is due to the lifetime tag

pub model: &'a LlamaModel,
.

That would avoid creating a bunch of wrappers and just use this library with threads.


@MarcusDunn does that make sense? Do you see any trap about doing that change?

@MarcusDunn
Copy link
Contributor

MarcusDunn commented Oct 11, 2024

I think, (although I'd have to reread some of the c++) that the change to an Arc<LlamaModel> would be safe.

the context: NonNull<llama_cpp_sys_2::llama_context>, however, is !Send and !Sync and given how it's used in llama.cpp I don't think it's Sync, it might be Send. I'm not a good enough rust programmer to trust myself to make that judgment.

I'm happy to be convinced otherwise. PR's + multithreaded / async example to show it doesn't all blow up are welcome.

As mentioned in #483, I've used this library to create a very high performance inference server. Multithreading I think is a red herring when what you really want is batched decoding. It was an async web sever and I streamed output by turning a Reciever<LlamaToken> into a Stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants