Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt formatter API and canary transcribe tensor input support #9206

Merged
merged 22 commits into from
Jun 1, 2024

Conversation

pzelasko
Copy link
Collaborator

@pzelasko pzelasko commented May 15, 2024

What does this PR do ?

Generic prompt formatter for text modality with several out-of-the-box prompt format definitions. See the class documentation for more details.

Also, enables support for tensor/array inputs in Canary. Example snippet:

import os
from tempfile import NamedTemporaryFile

import lhotse

from nemo.collections.asr.models.aed_multitask_models import EncDecMultiTaskModel


def main():
    model = EncDecMultiTaskModel.from_pretrained("nvidia/canary-1b")

    path = ...
    rec = lhotse.Recording.from_file(path)
    audio = rec.load_audio()[0]

    TARGET_LANG = "es"

    # Array list input, legacy API for Canary-1B
    results = model.transcribe(
        audio=[audio],
        batch_size=1,
        num_workers=0,
        source_lang="en",
        target_lang=TARGET_LANG,
        task="asr",
        pnc="yes",
    )

    print(results)

    # Array list input, explicit single-turn prompt   
    results = model.transcribe(
        audio=[audio],
        batch_size=1,
        num_workers=0,
        role="user",
        slots={
            "source_lang": "en",
            "target_lang": TARGET_LANG,
            "task": "asr",
            "pnc": "yes",
        },
    )

    print(results)

    # Array list input, explicit multi-turn prompt   
    results = model.transcribe(
        audio=[audio],
        batch_size=1,
        num_workers=0,
        turns=[{
            "role": "user",
            "slots": {
                "source_lang": "en",
                "target_lang": TARGET_LANG,
                "task": "asr",
                "pnc": "yes",
            },
        }],
    )

    print(results)

    # Audio path input, explicit single-turn prompt   
    results = model.transcribe(
        audio=path,
        batch_size=1,
        num_workers=0,
        role="user",
        slots={
            "source_lang": "en",
            "target_lang": TARGET_LANG,
            "task": "asr",
            "pnc": "yes",
        },
    )

    print(results)

    # Legacy JSON manifest with slot values API for Canary-1B 
    with NamedTemporaryFile("w", suffix=".json") as f:
        lhotse.serialization.save_to_jsonl(
            [
                {
                    "audio_filepath": path,
                    "text": "irrelevant",
                    "duration": rec.duration,
                    "task": "asr",
                    "pnc": "yes",
                    "source_lang": "en",
                    "target_lang": TARGET_LANG,
                }
            ],
            f.name,
        )
        f.flush()
        os.fsync(f.fileno())

        results = model.transcribe(
            audio=f.name,
            batch_size=1,
            num_workers=0,
        )

        print(results)


if __name__ == "__main__":
    main()

We can also now provide these values dynamically to transcribe_speech.py from CLI. Example:

# Legacy Canary-1B format
python ~/code/NeMo/examples/asr/transcribe_speech.py \
  audio_dir=wavs \
  output_filename=out.json \
  batch_size=1 \
  pretrained_name=nvidia/canary-1b \
  +prompt.source_lang=en \
  +prompt.target_lang=es \
  +prompt.task=asr \
  +prompt.pnc=yes

# Explicit single-turn format
python ~/code/NeMo/examples/asr/transcribe_speech.py \
  audio_dir=wavs \
  output_filename=out.json \
  batch_size=1 \
  pretrained_name=nvidia/canary-1b \
  +prompt.role=user \
  +prompt.slots.source_lang=en \
  +prompt.slots.target_lang=es \
  +prompt.slots.task=asr \
  +prompt.slots.pnc=yes

# Explicit multi-turn format
python ~/code/NeMo/examples/asr/transcribe_speech.py \
  audio_dir=wavs \
  output_filename=out.json \
  batch_size=1 \
  pretrained_name=nvidia/canary-1b \
  +prompt.turns='[{role:user,slots:{source_lang:en,target_lang:es,task:asr,pnc:yes}}]'

Collection: ASR

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

pzelasko added 3 commits May 15, 2024 14:08
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
pzelasko added 3 commits May 21, 2024 13:52
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comments

nemo/collections/asr/parts/utils/streaming_utils.py Outdated Show resolved Hide resolved
nemo/collections/common/prompts/canary.py Outdated Show resolved Hide resolved
nemo/collections/common/prompts/canary.py Show resolved Hide resolved
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
pzelasko added 2 commits May 22, 2024 12:58
…atting issues

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…, add tests for aggtok

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
pzelasko added 4 commits May 23, 2024 17:34
…and drop pipes everywhere except template definition.

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko pzelasko marked this pull request as ready for review May 23, 2024 22:57
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko pzelasko changed the title Prompt formatter API and canary tensor dataset Prompt formatter API and canary transcribe tensor input support May 23, 2024

Text = "text"

def matches(self, value: Any) -> bool:

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
nemo/collections/common/prompts/formatter.py Fixed Show fixed Hide fixed
nemo/collections/common/prompts/formatter.py Fixed Show fixed Hide fixed

tokens, prompts = [], []
prompts_with_answers, prompts = [], []
for cut in cuts:
if isinstance(cut, MixedCut):
cut = cut._first_non_padding_cut
assert isinstance(cut, MonoCut), "Expected MonoCut."
Copy link
Collaborator

@stevehuang52 stevehuang52 May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better change to raising TypeError and saying something like "expected input audio to have single channel", since users might not know what "MonoCut" means

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

prompt = prompt.replace(_mangled(slot), value)
return self._apply_tokenizer(prompt, lang=slot_values.get(self.PROMPT_LANGUAGE_SLOT))

def encode_dialog(self, turns: list[dict]) -> dict[str, torch.Tensor]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i understand correctly, this PR is for encoder-decoder models like canary/bestow, where all the (multi-turn) dialogue should be in text.
can we think little bit about supporting audio modality also in slot values. (may be we should keep audio slots untokenized and replace it with "audio features" later. one way is for prompt formatter to return something like list of lists)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this and Piotr told me he is planning on this as v2. maybe we can resume the discussion at the time

Copy link
Collaborator

@krishnacpuvvada krishnacpuvvada May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg.
@pzelasko if possible, lets try to put the skeleton in place, e.g. if slot value needs to re-defined as (value, modality) tuple, return needs to be list of lists/tuples etc.

Copy link
Collaborator Author

@pzelasko pzelasko May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krishnacpuvvada I'm thinking for multimodal we'll add a method that returns a "formatted prompt" as a sequence of embeddings instead. the benefit of using embeddings rather than token IDs is that we can support models with non-discrete latent spaces in addition to discretized. there are a few options:

  1. initialize it with/register post-init a dict of {modality: nn.Module} that is used internally to convert "raw" modality input to sequence of embeddings; then, the prompt formatter is used at the beginning of forward step so that you can train these modules.
  2. provide sequence of embeddings directly, but even then you still need to use the formatter in forward step as it's unlikely you'll embed audio/images/video in the dataloader process on a CPU fast enough.

in terms of skeletons, I've already put in the Modality type with a single type text that's used in slot schema definition and validation that a value "is" from a given modality. I 90% believe it'll be sufficient to extend to other modalities in V2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg.
Agreed. any audio encoder (especially our 600M ones) has to be run on GPU.

Copy link
Collaborator

@zhehuaichen zhehuaichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! LGTM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a thought.. can we also a sample.py/simple.py template with the simplest possible template and add few comments about which routines need to be defined. (This is mainly coming from - if a user wants to create their own custom template; I know there are plenty of examples already.. )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, yeah a canonical form of template to copy paste and directly modify

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks really good now, minor comments from me, lets address the rest and merge


tokens, prompts = [], []
prompts_with_answers, prompts = [], []
for cut in cuts:
if isinstance(cut, MixedCut):
cut = cut._first_non_padding_cut
assert isinstance(cut, MonoCut), "Expected MonoCut."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -134,6 +131,12 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None):

super().__init__(cfg=cfg, trainer=trainer)

prompt_cls = PromptFormatter.resolve(self.prompt_format)
self.prompt = prompt_cls(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important for this PR, but I was thinking of serializing the keys of the prompt format into config for user visibility.

@@ -977,3 +1002,78 @@ def predict_step(self, batch, batch_idx=0, dataloader_idx=0, has_processed_signa

text = [self.decoding.strip_special_tokens(t) for t in text]
return text


def parse_multitask_prompt(prompt: dict | None) -> list[dict]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice !

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, yeah a canonical form of template to copy paste and directly modify

nemo/collections/common/prompts/formatter.py Outdated Show resolved Hide resolved
nemo/collections/common/prompts/formatter.py Outdated Show resolved Hide resolved
pzelasko added 2 commits May 31, 2024 10:10
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…sure Llama2 format gives identical results with the reference implementation

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work !

Copy link
Collaborator

@stevehuang52 stevehuang52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work~! LGTM

@titu1994 titu1994 merged commit 28ccec7 into main Jun 1, 2024
133 checks passed
@titu1994 titu1994 deleted the prompt-formatter-and-canary-tensor-dataset branch June 1, 2024 04:40
BoxiangW pushed a commit to BoxiangW/NeMo that referenced this pull request Jun 5, 2024
…IA#9206)

* Apply CanaryPromptFormatter in dataset/inference

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Working inference with CanaryPromptFormatter

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Minimum working example of Canary.transcribe() with tensors

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* training fix

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Update to the new 'chat' based prompt formatting API

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Prompt formatters for popular models and partial unit test coverage

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Updated documentation

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Improved test coverage + proper preamble support

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix usage of PromptFormatter for MT-AED class + fix tokenization/formatting issues

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Move some canary hacks to canary prompt formatter, improve validation, add tests for aggtok

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* aed_model.transcribe(**slots) support, rename all slots to lowercase and drop pipes everywhere except template definition.

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* truly generic version

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* making transcribe_speech.py work prompt slots + syntactic sugar

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* update streaming_utils.py

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* fix

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* code review: partial

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Accept multi-turn, single-turn, and legacy prompt format in transcribe() and transcribe_speech.py

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Address code reviews

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Add support for SPE special tokens bos/eos in prompt templates and ensure Llama2 format gives identical results with the reference implementation

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix tests and add llama2 prompt formatter tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Boxiang Wang <boxiangw@nvidia.com>
janekl pushed a commit that referenced this pull request Jun 12, 2024
* Apply CanaryPromptFormatter in dataset/inference

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Working inference with CanaryPromptFormatter

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Minimum working example of Canary.transcribe() with tensors

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* training fix

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Update to the new 'chat' based prompt formatting API

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Prompt formatters for popular models and partial unit test coverage

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Updated documentation

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Improved test coverage + proper preamble support

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix usage of PromptFormatter for MT-AED class + fix tokenization/formatting issues

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Move some canary hacks to canary prompt formatter, improve validation, add tests for aggtok

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* aed_model.transcribe(**slots) support, rename all slots to lowercase and drop pipes everywhere except template definition.

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* truly generic version

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* making transcribe_speech.py work prompt slots + syntactic sugar

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* update streaming_utils.py

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* fix

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* code review: partial

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Accept multi-turn, single-turn, and legacy prompt format in transcribe() and transcribe_speech.py

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Address code reviews

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Add support for SPE special tokens bos/eos in prompt templates and ensure Llama2 format gives identical results with the reference implementation

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix tests and add llama2 prompt formatter tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
…IA#9206)

* Apply CanaryPromptFormatter in dataset/inference

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Working inference with CanaryPromptFormatter

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Minimum working example of Canary.transcribe() with tensors

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* training fix

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Update to the new 'chat' based prompt formatting API

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Prompt formatters for popular models and partial unit test coverage

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Updated documentation

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Improved test coverage + proper preamble support

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix usage of PromptFormatter for MT-AED class + fix tokenization/formatting issues

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Move some canary hacks to canary prompt formatter, improve validation, add tests for aggtok

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* aed_model.transcribe(**slots) support, rename all slots to lowercase and drop pipes everywhere except template definition.

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* truly generic version

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* making transcribe_speech.py work prompt slots + syntactic sugar

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* update streaming_utils.py

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* fix

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* code review: partial

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Accept multi-turn, single-turn, and legacy prompt format in transcribe() and transcribe_speech.py

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Address code reviews

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Add support for SPE special tokens bos/eos in prompt templates and ensure Llama2 format gives identical results with the reference implementation

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix tests and add llama2 prompt formatter tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@ko3n1g ko3n1g mentioned this pull request Jul 18, 2024
2 tasks
XuesongYang pushed a commit to paarthneekhara/NeMo that referenced this pull request Jan 18, 2025
…IA#9206)

* Apply CanaryPromptFormatter in dataset/inference

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Working inference with CanaryPromptFormatter

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Minimum working example of Canary.transcribe() with tensors

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* training fix

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Update to the new 'chat' based prompt formatting API

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Prompt formatters for popular models and partial unit test coverage

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Updated documentation

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Improved test coverage + proper preamble support

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix usage of PromptFormatter for MT-AED class + fix tokenization/formatting issues

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Move some canary hacks to canary prompt formatter, improve validation, add tests for aggtok

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* aed_model.transcribe(**slots) support, rename all slots to lowercase and drop pipes everywhere except template definition.

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* truly generic version

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* making transcribe_speech.py work prompt slots + syntactic sugar

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* update streaming_utils.py

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* fix

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* code review: partial

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Accept multi-turn, single-turn, and legacy prompt format in transcribe() and transcribe_speech.py

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Address code reviews

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Add support for SPE special tokens bos/eos in prompt templates and ensure Llama2 format gives identical results with the reference implementation

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix tests and add llama2 prompt formatter tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants