I have a concrete use case :

`async_stream`

As I understand, any async code must be Send since the execution can moved in another thread when the code is interrupted and then resumed.

{
  ...
  some().await; // Thread 1
  ...
  other().await; // Thread 2
  ...
  another().await; // Thread 1
}

Using LlamaContext with async leads to LlamaContext<'_> which is not Send Error. Even if creating the ctx inside the async_stream (since the execution can move in any thread when interrupted).

let stream = async_stream::stream! {
    let model = model_factory.create(&model_path).unwrap();
        let mut ctx = match model.new_context(&self.backend, LlamaContextParams::default()) {
        //      ------- has type `LlamaContext<'_>` which is not `Send`
            Ok(ctx) => ctx,
            Err(err) => {
                yield Err(err.into());
                return;
            }
        };

        // ...

        while n_cur <= prompt_and_response_length {
            //...

            let mut output_string = String::with_capacity(32);
            let _decode_result =
                decoder.decode_to_string(&output_bytes, &mut output_string, false);

            yield Ok(output_string);

            // ...
        }

}

Box::pin(stream)

To have the LlamaContext Send, the reference to the LlamaModel should not be a lifetime but a Arc so it's thread safe.
Which is due to the lifetime tag

llama-cpp-rs/llama-cpp-2/src/context.rs

Line 28 in 08d7495

pub model: &'a LlamaModel,

That would avoid creating a bunch of wrappers and just use this library with threads.

@MarcusDunn does that make sense? Do you see any trap about doing that change?

Impelement sync / send for llama structs #481

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions