Fix phi3 chat template confusion with zephyr #7449

tristandruyen · 2024-05-22T01:22:38Z

Reported in & Resolves #7432

Problem

Root-Cause

Phi-3's template, seems to have changed sometime after my PR #6857 was merged.
Initially it didn't explicitly include <|user|> but spliced the role into there...

OLD:

{{ bos_token }}{% for message in messages %}{{'<|' + message['role'] + '|>' + ' ' + message['content'] + '<|end|> ' }}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|> ' }}{% else %}{{ eos_token }}{% endif %}

NEW:

{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|user|>' + ' ' + message['content'] + '<|end|>' + ' ' + '<|assistant|>' + ' '}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|end|>' + ' '}}{% endif %}{% endfor %}

This causes this mis-matching as zephyr.

Solution

It's pretty simple, stop zephyr from applying to every model with <|user|> in the template by ~~adding an additional requirement.~~ reordering the if else's

github-actions · 2024-05-22T03:03:47Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 529 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8849.99ms p(95)=21246.52ms fails=, finish reason: stop=473 truncated=56
Prompt processing (pp): avg=102.35tk/s p(95)=462.85tk/s
Token generation (tg): avg=32.41tk/s p(95)=46.3tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=master commit=3574d6369018002662b6650c5dc0c29657e71d7a

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716486846 --> 1716487472
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 335.8, 335.8, 335.8, 335.8, 335.8, 591.22, 591.22, 591.22, 591.22, 591.22, 568.5, 568.5, 568.5, 568.5, 568.5, 597.32, 597.32, 597.32, 597.32, 597.32, 630.3, 630.3, 630.3, 630.3, 630.3, 683.09, 683.09, 683.09, 683.09, 683.09, 685.99, 685.99, 685.99, 685.99, 685.99, 694.02, 694.02, 694.02, 694.02, 694.02, 706.0, 706.0, 706.0, 706.0, 706.0, 721.41, 721.41, 721.41, 721.41, 721.41, 723.21, 723.21, 723.21, 723.21, 723.21, 742.83, 742.83, 742.83, 742.83, 742.83, 789.57, 789.57, 789.57, 789.57, 789.57, 806.68, 806.68, 806.68, 806.68, 806.68, 803.4, 803.4, 803.4, 803.4, 803.4, 809.68, 809.68, 809.68, 809.68, 809.68, 813.43, 813.43, 813.43, 813.43, 813.43, 815.65, 815.65, 815.65, 815.65, 815.65, 814.94, 814.94, 814.94, 814.94, 814.94, 810.14, 810.14, 810.14, 810.14, 810.14, 811.01, 811.01, 811.01, 811.01, 811.01, 811.6, 811.6, 811.6, 811.6, 811.6, 817.49, 817.49, 817.49, 817.49, 817.49, 820.01, 820.01, 820.01, 820.01, 820.01, 841.62, 841.62, 841.62, 841.62, 841.62, 836.24, 836.24, 836.24, 836.24, 836.24, 838.88, 838.88, 838.88, 838.88, 838.88, 847.86, 847.86, 847.86, 847.86, 847.86, 852.04, 852.04, 852.04, 852.04, 852.04, 852.41, 852.41, 852.41, 852.41, 852.41, 851.69, 851.69, 851.69, 851.69, 851.69, 857.43, 857.43, 857.43, 857.43, 857.43, 856.59, 856.59, 856.59, 856.59, 856.59, 855.02, 855.02, 855.02, 855.02, 855.02, 858.11, 858.11, 858.11, 858.11, 858.11, 859.74, 859.74, 859.74, 859.74, 859.74, 864.34, 864.34, 864.34, 864.34, 864.34, 863.81, 863.81, 863.81, 863.81, 863.81, 844.47, 844.47, 844.47, 844.47, 844.47, 841.06, 841.06, 841.06, 841.06, 841.06, 840.37, 840.37, 840.37, 840.37, 840.37, 843.06, 843.06, 843.06, 843.06, 843.06, 845.14, 845.14, 845.14, 845.14, 845.14, 845.78, 845.78, 845.78, 845.78, 845.78, 861.76, 861.76, 861.76, 861.76, 861.76, 860.59, 860.59, 860.59, 860.59, 860.59, 858.96, 858.96, 858.96, 858.96, 858.96, 857.73, 857.73, 857.73, 857.73, 857.73, 856.24, 856.24, 856.24, 856.24, 856.24, 862.25, 862.25, 862.25, 862.25, 862.25, 861.61, 861.61, 861.61, 861.61, 861.61, 864.28, 864.28, 864.28, 864.28, 864.28, 865.52, 865.52, 865.52, 865.52, 865.52, 868.9, 868.9, 868.9, 868.9, 868.9, 865.3, 865.3, 865.3, 865.3, 865.3, 865.81, 865.81, 865.81, 865.81, 865.81, 869.07, 869.07, 869.07, 869.07, 869.07, 870.63, 870.63, 870.63, 870.63, 870.63, 869.99, 869.99, 869.99, 869.99, 869.99, 870.98, 870.98, 870.98, 870.98, 870.98, 869.52, 869.52, 869.52, 869.52]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716486846 --> 1716487472
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 43.26, 43.26, 43.26, 43.26, 43.26, 45.49, 45.49, 45.49, 45.49, 45.49, 26.53, 26.53, 26.53, 26.53, 26.53, 29.95, 29.95, 29.95, 29.95, 29.95, 31.24, 31.24, 31.24, 31.24, 31.24, 31.57, 31.57, 31.57, 31.57, 31.57, 32.87, 32.87, 32.87, 32.87, 32.87, 34.26, 34.26, 34.26, 34.26, 34.26, 34.54, 34.54, 34.54, 34.54, 34.54, 34.54, 34.54, 34.54, 34.54, 34.54, 33.97, 33.97, 33.97, 33.97, 33.97, 33.95, 33.95, 33.95, 33.95, 33.95, 33.17, 33.17, 33.17, 33.17, 33.17, 32.59, 32.59, 32.59, 32.59, 32.59, 32.29, 32.29, 32.29, 32.29, 32.29, 31.99, 31.99, 31.99, 31.99, 31.99, 30.59, 30.59, 30.59, 30.59, 30.59, 29.96, 29.96, 29.96, 29.96, 29.96, 29.9, 29.9, 29.9, 29.9, 29.9, 29.98, 29.98, 29.98, 29.98, 29.98, 29.73, 29.73, 29.73, 29.73, 29.73, 29.74, 29.74, 29.74, 29.74, 29.74, 29.78, 29.78, 29.78, 29.78, 29.78, 29.97, 29.97, 29.97, 29.97, 29.97, 30.12, 30.12, 30.12, 30.12, 30.12, 29.99, 29.99, 29.99, 29.99, 29.99, 30.07, 30.07, 30.07, 30.07, 30.07, 30.42, 30.42, 30.42, 30.42, 30.42, 30.24, 30.24, 30.24, 30.24, 30.24, 30.34, 30.34, 30.34, 30.34, 30.34, 30.59, 30.59, 30.59, 30.59, 30.59, 30.73, 30.73, 30.73, 30.73, 30.73, 30.86, 30.86, 30.86, 30.86, 30.86, 30.96, 30.96, 30.96, 30.96, 30.96, 31.08, 31.08, 31.08, 31.08, 31.08, 31.13, 31.13, 31.13, 31.13, 31.13, 31.08, 31.08, 31.08, 31.08, 31.08, 30.77, 30.77, 30.77, 30.77, 30.77, 30.67, 30.67, 30.67, 30.67, 30.67, 30.52, 30.52, 30.52, 30.52, 30.52, 30.48, 30.48, 30.48, 30.48, 30.48, 30.53, 30.53, 30.53, 30.53, 30.53, 30.62, 30.62, 30.62, 30.62, 30.62, 30.79, 30.79, 30.79, 30.79, 30.79, 30.64, 30.64, 30.64, 30.64, 30.64, 30.48, 30.48, 30.48, 30.48, 30.48, 30.19, 30.19, 30.19, 30.19, 30.19, 29.57, 29.57, 29.57, 29.57, 29.57, 29.11, 29.11, 29.11, 29.11, 29.11, 29.05, 29.05, 29.05, 29.05, 29.05, 29.08, 29.08, 29.08, 29.08, 29.08, 29.16, 29.16, 29.16, 29.16, 29.16, 29.23, 29.23, 29.23, 29.23, 29.23, 29.27, 29.27, 29.27, 29.27, 29.27, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.23, 29.23, 29.23, 29.23, 29.23, 29.3, 29.3, 29.3, 29.3, 29.3, 29.48, 29.48, 29.48, 29.48, 29.48, 29.5, 29.5, 29.5, 29.5, 29.5, 29.61, 29.61, 29.61, 29.61]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716486846 --> 1716487472
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.12, 0.12, 0.12, 0.12, 0.12, 0.37, 0.37, 0.37, 0.37, 0.37, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.08, 0.08, 0.08, 0.08, 0.08, 0.28, 0.28, 0.28, 0.28, 0.28, 0.25, 0.25, 0.25, 0.25, 0.25, 0.35, 0.35, 0.35, 0.35, 0.35, 0.33, 0.33, 0.33, 0.33, 0.33, 0.26, 0.26, 0.26, 0.26, 0.26, 0.22, 0.22, 0.22, 0.22, 0.22, 0.17, 0.17, 0.17, 0.17, 0.17, 0.3, 0.3, 0.3, 0.3, 0.3, 0.24, 0.24, 0.24, 0.24, 0.24, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.34, 0.34, 0.34, 0.34, 0.34, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.31, 0.31, 0.31, 0.31, 0.31, 0.27, 0.27, 0.27, 0.27, 0.27, 0.28, 0.28, 0.28, 0.28, 0.28, 0.31, 0.31, 0.31, 0.31, 0.31, 0.23, 0.23, 0.23, 0.23, 0.23, 0.09, 0.09, 0.09, 0.09, 0.09, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.09, 0.09, 0.09, 0.09, 0.09, 0.51, 0.51, 0.51, 0.51, 0.51, 0.67, 0.67, 0.67, 0.67, 0.67, 0.59, 0.59, 0.59, 0.59, 0.59, 0.33, 0.33, 0.33, 0.33, 0.33, 0.2, 0.2, 0.2, 0.2, 0.2, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716486846 --> 1716487472
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0]

llama.cpp

tests/test-chat-template.cpp

ngxson

LGTM overall. It would be nice to have all 4 templates in the test.

llama.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

llama.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

ngxson · 2024-05-23T12:52:55Z

OK I'll merge when CI passes

tristandruyen · 2024-05-23T14:07:24Z

The test failures seem unrelated to the changes, should be good to merge.

* Fix phi3 template matching vs zephyr * Add regression test for new phi3 chat template * Implement review suggestions * Fix phi3 jinja test templates & match by <|end|> * Apply suggestion Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Add all phi3 template variants in tests * Remove unneeded message trimming Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Fix tests to not expect trimmed messages --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

tristandruyen added 2 commits May 22, 2024 03:19

Fix phi3 template matching vs zephyr

bde2e76

Add regression test for new phi3 chat template

45108ec

github-actions bot added the testing Everything test related label May 22, 2024

tristandruyen changed the title ~~Fix phi3 template matching vs zephyr~~ Fix phi3 chat template confusion with zephyr May 22, 2024

tristandruyen marked this pull request as ready for review May 22, 2024 01:54

coder543 mentioned this pull request May 22, 2024

phi3 medium small vision ollama/ollama#4560

Open

mofosyne added model Model specific Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 22, 2024

tristandruyen commented May 22, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

Implement review suggestions

a142933

tristandruyen requested a review from ngxson May 22, 2024 10:32

ThatcherC mentioned this pull request May 22, 2024

Phi 3 medium/small support #7439

Closed

Fix phi3 jinja test templates & match by <|end|>

037af53

ngxson reviewed May 23, 2024

View reviewed changes

tests/test-chat-template.cpp Show resolved Hide resolved

ngxson approved these changes May 23, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

tristandruyen and others added 2 commits May 23, 2024 14:20

Apply suggestion

05e6bc6

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

Add all phi3 template variants in tests

a9bbb11

ngxson requested changes May 23, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

Remove unneeded message trimming

85ed87e

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

ngxson approved these changes May 23, 2024

View reviewed changes

Fix tests to not expect trimmed messages

3574d63

ngxson merged commit 007489e into ggerganov:master May 23, 2024
58 of 70 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix phi3 chat template confusion with zephyr #7449

Fix phi3 chat template confusion with zephyr #7449

tristandruyen commented May 22, 2024 •

edited

Loading

github-actions bot commented May 22, 2024 •

edited

Loading

ngxson left a comment

ngxson commented May 23, 2024

tristandruyen commented May 23, 2024

Fix phi3 chat template confusion with zephyr #7449

Fix phi3 chat template confusion with zephyr #7449

Conversation

tristandruyen commented May 22, 2024 • edited Loading

Problem

Root-Cause

Solution

github-actions bot commented May 22, 2024 • edited Loading

ngxson left a comment

Choose a reason for hiding this comment

ngxson commented May 23, 2024

tristandruyen commented May 23, 2024

tristandruyen commented May 22, 2024 •

edited

Loading

github-actions bot commented May 22, 2024 •

edited

Loading