LlamaCpp doesnt work with generate.fsm for custom FSMs #965

Radu1999 · 2024-06-13T18:32:39Z

Describe the issue as clearly as possible:

The example with custom fsm from documentation doesnt work for LlamaCpp as

logits, kv_cache = model(token_ids, attention_masks, kv_cache)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'LlamaCpp' object is not callable

Steps/code to reproduce the bug:

from transformers import AutoTokenizer
from outlines import models, generate
from outlines.models.transformers import TransformerTokenizer
from llama_cpp import Llama
import interegular
import torch

if __name__ == "__main__":
    # Create model
    llm = Llama("./models/Mistral-7B-Instruct-v0.2/mistral-7b-instruct-v0.2.Q5_K_M.gguf")
    model = models.LlamaCpp(llm)
    model.tokenizer = TransformerTokenizer(AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', use_fast=True))
    model.device = 'cpu'

    # Create fsm
    list_of_strings_pattern = """\["[^"\s]*"(?:,"[^"\s]*")*\]"""
    pink_elephant_pattern = """.*(pink|elephant).*"""

    list_of_strings_fsm = interegular.parse_pattern(list_of_strings_pattern).to_fsm()
    pink_elephant_fsm = interegular.parse_pattern(pink_elephant_pattern).to_fsm()

    difference_fsm = list_of_strings_fsm - pink_elephant_fsm

    generator = generate.fsm(model, difference_fsm)
    rng = torch.Generator(device="cpu")
    rng.manual_seed(789005)

    response = generator("[INST] Don't talk about pink elephants [/INST]")
    print(response)

Expected result:

I d expect it to work :)

Error message:

No response

Outlines/Python version information:

Version information
latest

Context for the issue:

No response

The text was updated successfully, but these errors were encountered:

lapp0 · 2024-06-21T15:11:14Z

It will be a bit before this is merged into main, but you can try it early with

pip install --upgrade git+https://github.com/lapp0/outlines@fix-llamacpp-fsm

Works on my end, please let me know if you run into any issues!

….py (#998) A lot of these fixes were intended for #966 however that's blocked until there's a new `transformers` release. These improvements are general to all models and will enable PRs resolving #806 and #965 # Structure of `OutlinesLogitsProcessor` The goal is to create a base class which allows a logits processors to be implemented once and used for any `outlines.models` inference library. To accomplish this we must normalize the input array. It must have a consistent type (`torch.Tensor`) and consistent dimensionality (2). We can normalize both of these simply, and without any copy operations. `mlx.core.array`, `numpy.array`, and `torch.Tensor` all support [pythons array standard `__dlpack__`](https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html). This standard allows for casting between array types without copying. `torch.Tensor` is the only input type which cannot always be cast to any other type because torch tensors may live in GPU memory. Therefore, we cast all arrays to `torch.Tensor`, implement logits processors using torch methods, and convert back to the original array type in `OutlinesLogitsProcessor`. See docstring of `OutlinesLogitsProcessor.__call__()` for more details. # Detailed Changes - Rename `BaseLogitsProcessor` to `OutlinesLogitsProcessor` - Ensure `OutlinesLogitsProcessor.process_logits()` is always passed a 2D batch request with `torch.Tensor` logits and `List` input_ids. Also clean up code to be more readable in `OutlinesLogitsProcessor__call__()` - Ensure `FSMLogitsProcessor` allows unstable sequence ordering (beam search in transformers and vLLM change the order of sequences) - Update `tests/generate/test_generate.py` to cover more permutations of - regex / text - batch / single - greedy / multinomial / beam search - `stream()` / `generate()` - Ensure performance stability with difference array libraries through `benchmark_processors.py`

Radu1999 added the bug label Jun 13, 2024

brandonwillard added enhancement bug structured generation Linked to structured generation llama.cpp Related to the `llama.cpp` integration and removed bug enhancement labels Jun 13, 2024

rlouf added this to Improve Outlines Jun 14, 2024

rlouf moved this to Todo in Improve Outlines Jun 14, 2024

lapp0 mentioned this issue Jun 21, 2024

Use outlines.processors for models.llamacpp #997

Merged

lapp0 mentioned this issue Jun 21, 2024

Improve outlines.processors, add integration tests to test_generate.py #998

Merged

rlouf closed this as completed in #997 Jul 15, 2024

github-project-automation bot moved this from Todo to Done in Improve Outlines Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlamaCpp doesnt work with generate.fsm for custom FSMs #965

LlamaCpp doesnt work with generate.fsm for custom FSMs #965

Radu1999 commented Jun 13, 2024

lapp0 commented Jun 21, 2024

LlamaCpp doesnt work with generate.fsm for custom FSMs #965

LlamaCpp doesnt work with generate.fsm for custom FSMs #965

Comments

Radu1999 commented Jun 13, 2024

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

lapp0 commented Jun 21, 2024