Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix phi3 chat template confusion with zephyr #7449

Merged
merged 8 commits into from
May 23, 2024
Merged

Fix phi3 chat template confusion with zephyr #7449

merged 8 commits into from
May 23, 2024

Conversation

tristandruyen
Copy link
Contributor

@tristandruyen tristandruyen commented May 22, 2024

Reported in & Resolves #7432

Problem

Zephyr & Phi3 use similar tokens like <|assistant|> & <|user|>, but they use different eos tokens, as the zephyr template appears first in the if else chain, phi3 is misdetected as zephyr, which causes wrong <|endoftext|> instead of <|end|> tokens to appear in the prompt as reported in the referenced issue.

Root-Cause

Phi-3's template, seems to have changed sometime after my PR #6857 was merged.
Initially it didn't explicitly include <|user|> but spliced the role into there...

OLD:

{{ bos_token }}{% for message in messages %}{{'<|' + message['role'] + '|>' + ' ' + message['content'] + '<|end|> ' }}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|> ' }}{% else %}{{ eos_token }}{% endif %}

NEW:

{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|user|>' + ' ' + message['content'] + '<|end|>' + ' ' + '<|assistant|>' + ' '}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|end|>' + ' '}}{% endif %}{% endfor %}

This causes this mis-matching as zephyr.

Solution

It's pretty simple, stop zephyr from applying to every model with <|user|> in the template by adding an additional requirement. reordering the if else's

@github-actions github-actions bot added the testing Everything test related label May 22, 2024
@tristandruyen tristandruyen changed the title Fix phi3 template matching vs zephyr Fix phi3 chat template confusion with zephyr May 22, 2024
@tristandruyen tristandruyen marked this pull request as ready for review May 22, 2024 01:54
Copy link
Contributor

github-actions bot commented May 22, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 529 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8849.99ms p(95)=21246.52ms fails=, finish reason: stop=473 truncated=56
  • Prompt processing (pp): avg=102.35tk/s p(95)=462.85tk/s
  • Token generation (tg): avg=32.41tk/s p(95)=46.3tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=master commit=3574d6369018002662b6650c5dc0c29657e71d7a

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716486846 --> 1716487472
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 335.8, 335.8, 335.8, 335.8, 335.8, 591.22, 591.22, 591.22, 591.22, 591.22, 568.5, 568.5, 568.5, 568.5, 568.5, 597.32, 597.32, 597.32, 597.32, 597.32, 630.3, 630.3, 630.3, 630.3, 630.3, 683.09, 683.09, 683.09, 683.09, 683.09, 685.99, 685.99, 685.99, 685.99, 685.99, 694.02, 694.02, 694.02, 694.02, 694.02, 706.0, 706.0, 706.0, 706.0, 706.0, 721.41, 721.41, 721.41, 721.41, 721.41, 723.21, 723.21, 723.21, 723.21, 723.21, 742.83, 742.83, 742.83, 742.83, 742.83, 789.57, 789.57, 789.57, 789.57, 789.57, 806.68, 806.68, 806.68, 806.68, 806.68, 803.4, 803.4, 803.4, 803.4, 803.4, 809.68, 809.68, 809.68, 809.68, 809.68, 813.43, 813.43, 813.43, 813.43, 813.43, 815.65, 815.65, 815.65, 815.65, 815.65, 814.94, 814.94, 814.94, 814.94, 814.94, 810.14, 810.14, 810.14, 810.14, 810.14, 811.01, 811.01, 811.01, 811.01, 811.01, 811.6, 811.6, 811.6, 811.6, 811.6, 817.49, 817.49, 817.49, 817.49, 817.49, 820.01, 820.01, 820.01, 820.01, 820.01, 841.62, 841.62, 841.62, 841.62, 841.62, 836.24, 836.24, 836.24, 836.24, 836.24, 838.88, 838.88, 838.88, 838.88, 838.88, 847.86, 847.86, 847.86, 847.86, 847.86, 852.04, 852.04, 852.04, 852.04, 852.04, 852.41, 852.41, 852.41, 852.41, 852.41, 851.69, 851.69, 851.69, 851.69, 851.69, 857.43, 857.43, 857.43, 857.43, 857.43, 856.59, 856.59, 856.59, 856.59, 856.59, 855.02, 855.02, 855.02, 855.02, 855.02, 858.11, 858.11, 858.11, 858.11, 858.11, 859.74, 859.74, 859.74, 859.74, 859.74, 864.34, 864.34, 864.34, 864.34, 864.34, 863.81, 863.81, 863.81, 863.81, 863.81, 844.47, 844.47, 844.47, 844.47, 844.47, 841.06, 841.06, 841.06, 841.06, 841.06, 840.37, 840.37, 840.37, 840.37, 840.37, 843.06, 843.06, 843.06, 843.06, 843.06, 845.14, 845.14, 845.14, 845.14, 845.14, 845.78, 845.78, 845.78, 845.78, 845.78, 861.76, 861.76, 861.76, 861.76, 861.76, 860.59, 860.59, 860.59, 860.59, 860.59, 858.96, 858.96, 858.96, 858.96, 858.96, 857.73, 857.73, 857.73, 857.73, 857.73, 856.24, 856.24, 856.24, 856.24, 856.24, 862.25, 862.25, 862.25, 862.25, 862.25, 861.61, 861.61, 861.61, 861.61, 861.61, 864.28, 864.28, 864.28, 864.28, 864.28, 865.52, 865.52, 865.52, 865.52, 865.52, 868.9, 868.9, 868.9, 868.9, 868.9, 865.3, 865.3, 865.3, 865.3, 865.3, 865.81, 865.81, 865.81, 865.81, 865.81, 869.07, 869.07, 869.07, 869.07, 869.07, 870.63, 870.63, 870.63, 870.63, 870.63, 869.99, 869.99, 869.99, 869.99, 869.99, 870.98, 870.98, 870.98, 870.98, 870.98, 869.52, 869.52, 869.52, 869.52]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716486846 --> 1716487472
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 43.26, 43.26, 43.26, 43.26, 43.26, 45.49, 45.49, 45.49, 45.49, 45.49, 26.53, 26.53, 26.53, 26.53, 26.53, 29.95, 29.95, 29.95, 29.95, 29.95, 31.24, 31.24, 31.24, 31.24, 31.24, 31.57, 31.57, 31.57, 31.57, 31.57, 32.87, 32.87, 32.87, 32.87, 32.87, 34.26, 34.26, 34.26, 34.26, 34.26, 34.54, 34.54, 34.54, 34.54, 34.54, 34.54, 34.54, 34.54, 34.54, 34.54, 33.97, 33.97, 33.97, 33.97, 33.97, 33.95, 33.95, 33.95, 33.95, 33.95, 33.17, 33.17, 33.17, 33.17, 33.17, 32.59, 32.59, 32.59, 32.59, 32.59, 32.29, 32.29, 32.29, 32.29, 32.29, 31.99, 31.99, 31.99, 31.99, 31.99, 30.59, 30.59, 30.59, 30.59, 30.59, 29.96, 29.96, 29.96, 29.96, 29.96, 29.9, 29.9, 29.9, 29.9, 29.9, 29.98, 29.98, 29.98, 29.98, 29.98, 29.73, 29.73, 29.73, 29.73, 29.73, 29.74, 29.74, 29.74, 29.74, 29.74, 29.78, 29.78, 29.78, 29.78, 29.78, 29.97, 29.97, 29.97, 29.97, 29.97, 30.12, 30.12, 30.12, 30.12, 30.12, 29.99, 29.99, 29.99, 29.99, 29.99, 30.07, 30.07, 30.07, 30.07, 30.07, 30.42, 30.42, 30.42, 30.42, 30.42, 30.24, 30.24, 30.24, 30.24, 30.24, 30.34, 30.34, 30.34, 30.34, 30.34, 30.59, 30.59, 30.59, 30.59, 30.59, 30.73, 30.73, 30.73, 30.73, 30.73, 30.86, 30.86, 30.86, 30.86, 30.86, 30.96, 30.96, 30.96, 30.96, 30.96, 31.08, 31.08, 31.08, 31.08, 31.08, 31.13, 31.13, 31.13, 31.13, 31.13, 31.08, 31.08, 31.08, 31.08, 31.08, 30.77, 30.77, 30.77, 30.77, 30.77, 30.67, 30.67, 30.67, 30.67, 30.67, 30.52, 30.52, 30.52, 30.52, 30.52, 30.48, 30.48, 30.48, 30.48, 30.48, 30.53, 30.53, 30.53, 30.53, 30.53, 30.62, 30.62, 30.62, 30.62, 30.62, 30.79, 30.79, 30.79, 30.79, 30.79, 30.64, 30.64, 30.64, 30.64, 30.64, 30.48, 30.48, 30.48, 30.48, 30.48, 30.19, 30.19, 30.19, 30.19, 30.19, 29.57, 29.57, 29.57, 29.57, 29.57, 29.11, 29.11, 29.11, 29.11, 29.11, 29.05, 29.05, 29.05, 29.05, 29.05, 29.08, 29.08, 29.08, 29.08, 29.08, 29.16, 29.16, 29.16, 29.16, 29.16, 29.23, 29.23, 29.23, 29.23, 29.23, 29.27, 29.27, 29.27, 29.27, 29.27, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.3, 29.23, 29.23, 29.23, 29.23, 29.23, 29.3, 29.3, 29.3, 29.3, 29.3, 29.48, 29.48, 29.48, 29.48, 29.48, 29.5, 29.5, 29.5, 29.5, 29.5, 29.61, 29.61, 29.61, 29.61]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716486846 --> 1716487472
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.12, 0.12, 0.12, 0.12, 0.12, 0.37, 0.37, 0.37, 0.37, 0.37, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.08, 0.08, 0.08, 0.08, 0.08, 0.28, 0.28, 0.28, 0.28, 0.28, 0.25, 0.25, 0.25, 0.25, 0.25, 0.35, 0.35, 0.35, 0.35, 0.35, 0.33, 0.33, 0.33, 0.33, 0.33, 0.26, 0.26, 0.26, 0.26, 0.26, 0.22, 0.22, 0.22, 0.22, 0.22, 0.17, 0.17, 0.17, 0.17, 0.17, 0.3, 0.3, 0.3, 0.3, 0.3, 0.24, 0.24, 0.24, 0.24, 0.24, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.34, 0.34, 0.34, 0.34, 0.34, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.31, 0.31, 0.31, 0.31, 0.31, 0.27, 0.27, 0.27, 0.27, 0.27, 0.28, 0.28, 0.28, 0.28, 0.28, 0.31, 0.31, 0.31, 0.31, 0.31, 0.23, 0.23, 0.23, 0.23, 0.23, 0.09, 0.09, 0.09, 0.09, 0.09, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.09, 0.09, 0.09, 0.09, 0.09, 0.51, 0.51, 0.51, 0.51, 0.51, 0.67, 0.67, 0.67, 0.67, 0.67, 0.59, 0.59, 0.59, 0.59, 0.59, 0.33, 0.33, 0.33, 0.33, 0.33, 0.2, 0.2, 0.2, 0.2, 0.2, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716486846 --> 1716487472
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0]
                    
Loading

@mofosyne mofosyne added model Model specific Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 22, 2024
llama.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. It would be nice to have all 4 templates in the test.

llama.cpp Outdated Show resolved Hide resolved
tristandruyen and others added 2 commits May 23, 2024 14:20
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
llama.cpp Outdated Show resolved Hide resolved
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
@ngxson
Copy link
Collaborator

ngxson commented May 23, 2024

OK I'll merge when CI passes

@tristandruyen
Copy link
Contributor Author

The test failures seem unrelated to the changes, should be good to merge.

@ngxson ngxson merged commit 007489e into ggerganov:master May 23, 2024
58 of 70 checks passed
teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 23, 2024
* Fix phi3 template matching vs zephyr

* Add regression test for new phi3 chat template

* Implement review suggestions

* Fix phi3 jinja test templates & match by <|end|>

* Apply suggestion

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Add all phi3 template variants in tests

* Remove unneeded message trimming

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Fix tests to not expect trimmed messages

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model Model specific Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[server] phi-3 uses <|endoftext|> instead of <|end|> when applying chat template in /chat/completions
3 participants