Skip to content

Conversation

@slaren
Copy link
Member

@slaren slaren commented May 10, 2024

Fixes mistral being detected as 8B.

@slaren slaren merged commit 25c6e82 into master May 10, 2024
@slaren slaren deleted the sl/fix-mistral-8b branch May 10, 2024 12:28
@mofosyne mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix bugfix fixes an issue or bug labels May 10, 2024
@github-actions
Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 552 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8465.6ms p(95)=20058.77ms fails=, finish reason: stop=489 truncated=63
  • Prompt processing (pp): avg=98.84tk/s p(95)=462.47tk/s
  • Token generation (tg): avg=32.32tk/s p(95)=46.1tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sl/fix-mistral-8b commit=ae3305391f50edc5b0eca10c2709701834549f84

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715397860 --> 1715398488
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 332.1, 332.1, 332.1, 332.1, 332.1, 883.92, 883.92, 883.92, 883.92, 883.92, 873.78, 873.78, 873.78, 873.78, 873.78, 876.68, 876.68, 876.68, 876.68, 876.68, 893.83, 893.83, 893.83, 893.83, 893.83, 904.48, 904.48, 904.48, 904.48, 904.48, 897.56, 897.56, 897.56, 897.56, 897.56, 916.13, 916.13, 916.13, 916.13, 916.13, 917.02, 917.02, 917.02, 917.02, 917.02, 926.07, 926.07, 926.07, 926.07, 926.07, 921.62, 921.62, 921.62, 921.62, 921.62, 933.01, 933.01, 933.01, 933.01, 933.01, 949.18, 949.18, 949.18, 949.18, 949.18, 956.07, 956.07, 956.07, 956.07, 956.07, 868.01, 868.01, 868.01, 868.01, 868.01, 857.78, 857.78, 857.78, 857.78, 857.78, 855.29, 855.29, 855.29, 855.29, 855.29, 871.48, 871.48, 871.48, 871.48, 871.48, 876.4, 876.4, 876.4, 876.4, 876.4, 874.6, 874.6, 874.6, 874.6, 874.6, 880.39, 880.39, 880.39, 880.39, 880.39, 878.61, 878.61, 878.61, 878.61, 878.61, 882.8, 882.8, 882.8, 882.8, 882.8, 898.58, 898.58, 898.58, 898.58, 898.58, 899.64, 899.64, 899.64, 899.64, 899.64, 900.37, 900.37, 900.37, 900.37, 900.37, 912.13, 912.13, 912.13, 912.13, 912.13, 907.0, 907.0, 907.0, 907.0, 907.0, 901.79, 901.79, 901.79, 901.79, 901.79, 903.24, 903.24, 903.24, 903.24, 903.24, 905.33, 905.33, 905.33, 905.33, 905.33, 903.14, 903.14, 903.14, 903.14, 903.14, 900.54, 900.54, 900.54, 900.54, 900.54, 905.56, 905.56, 905.56, 905.56, 905.56, 910.11, 910.11, 910.11, 910.11, 910.11, 915.98, 915.98, 915.98, 915.98, 915.98, 922.19, 922.19, 922.19, 922.19, 922.19, 920.54, 920.54, 920.54, 920.54, 920.54, 917.97, 917.97, 917.97, 917.97, 917.97, 917.9, 917.9, 917.9, 917.9, 917.9, 919.9, 919.9, 919.9, 919.9, 919.9, 928.01, 928.01, 928.01, 928.01, 928.01, 922.8, 922.8, 922.8, 922.8, 922.8, 901.13, 901.13, 901.13, 901.13, 901.13, 899.39, 899.39, 899.39, 899.39, 899.39, 897.07, 897.07, 897.07, 897.07, 897.07, 901.67, 901.67, 901.67, 901.67, 901.67, 900.78, 900.78, 900.78, 900.78, 900.78, 902.7, 902.7, 902.7, 902.7, 902.7, 899.23, 899.23, 899.23, 899.23, 899.23, 901.57, 901.57, 901.57, 901.57, 901.57, 905.28, 905.28, 905.28, 905.28, 905.28, 904.86, 904.86, 904.86, 904.86, 904.86, 909.14, 909.14, 909.14, 909.14, 909.14, 907.92, 907.92, 907.92, 907.92, 907.92, 908.92, 908.92, 908.92, 908.92, 908.92, 909.0, 909.0, 909.0, 909.0, 909.0, 908.71, 908.71, 908.71, 908.71, 908.71, 910.07, 910.07, 910.07, 910.07, 910.07, 912.62, 912.62, 912.62, 912.62, 912.62, 912.65, 912.65, 912.65, 912.65, 912.65]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715397860 --> 1715398488
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.65, 41.65, 41.65, 41.65, 41.65, 36.26, 36.26, 36.26, 36.26, 36.26, 28.24, 28.24, 28.24, 28.24, 28.24, 27.79, 27.79, 27.79, 27.79, 27.79, 29.22, 29.22, 29.22, 29.22, 29.22, 30.24, 30.24, 30.24, 30.24, 30.24, 31.74, 31.74, 31.74, 31.74, 31.74, 33.25, 33.25, 33.25, 33.25, 33.25, 33.3, 33.3, 33.3, 33.3, 33.3, 33.45, 33.45, 33.45, 33.45, 33.45, 33.08, 33.08, 33.08, 33.08, 33.08, 32.69, 32.69, 32.69, 32.69, 32.69, 32.54, 32.54, 32.54, 32.54, 32.54, 31.95, 31.95, 31.95, 31.95, 31.95, 31.59, 31.59, 31.59, 31.59, 31.59, 31.12, 31.12, 31.12, 31.12, 31.12, 31.23, 31.23, 31.23, 31.23, 31.23, 31.55, 31.55, 31.55, 31.55, 31.55, 31.18, 31.18, 31.18, 31.18, 31.18, 30.9, 30.9, 30.9, 30.9, 30.9, 30.87, 30.87, 30.87, 30.87, 30.87, 30.82, 30.82, 30.82, 30.82, 30.82, 31.13, 31.13, 31.13, 31.13, 31.13, 30.95, 30.95, 30.95, 30.95, 30.95, 31.13, 31.13, 31.13, 31.13, 31.13, 31.26, 31.26, 31.26, 31.26, 31.26, 31.33, 31.33, 31.33, 31.33, 31.33, 30.9, 30.9, 30.9, 30.9, 30.9, 30.94, 30.94, 30.94, 30.94, 30.94, 31.19, 31.19, 31.19, 31.19, 31.19, 31.37, 31.37, 31.37, 31.37, 31.37, 31.38, 31.38, 31.38, 31.38, 31.38, 31.55, 31.55, 31.55, 31.55, 31.55, 31.66, 31.66, 31.66, 31.66, 31.66, 31.59, 31.59, 31.59, 31.59, 31.59, 31.49, 31.49, 31.49, 31.49, 31.49, 31.26, 31.26, 31.26, 31.26, 31.26, 31.22, 31.22, 31.22, 31.22, 31.22, 31.22, 31.22, 31.22, 31.22, 31.22, 31.36, 31.36, 31.36, 31.36, 31.36, 31.5, 31.5, 31.5, 31.5, 31.5, 31.66, 31.66, 31.66, 31.66, 31.66, 31.52, 31.52, 31.52, 31.52, 31.52, 31.12, 31.12, 31.12, 31.12, 31.12, 30.78, 30.78, 30.78, 30.78, 30.78, 29.98, 29.98, 29.98, 29.98, 29.98, 29.9, 29.9, 29.9, 29.9, 29.9, 29.93, 29.93, 29.93, 29.93, 29.93, 30.0, 30.0, 30.0, 30.0, 30.0, 30.11, 30.11, 30.11, 30.11, 30.11, 30.22, 30.22, 30.22, 30.22, 30.22, 30.27, 30.27, 30.27, 30.27, 30.27, 30.21, 30.21, 30.21, 30.21, 30.21, 30.21, 30.21, 30.21, 30.21, 30.21, 30.17, 30.17, 30.17, 30.17, 30.17, 30.27, 30.27, 30.27, 30.27, 30.27, 30.37, 30.37, 30.37, 30.37, 30.37, 30.45, 30.45, 30.45, 30.45, 30.45, 30.6, 30.6, 30.6, 30.6, 30.6, 30.64, 30.64, 30.64, 30.64, 30.64, 30.64, 30.64, 30.64, 30.64, 30.64]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715397860 --> 1715398488
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.42, 0.42, 0.42, 0.42, 0.42, 0.25, 0.25, 0.25, 0.25, 0.25, 0.18, 0.18, 0.18, 0.18, 0.18, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.25, 0.25, 0.25, 0.25, 0.25, 0.24, 0.24, 0.24, 0.24, 0.24, 0.2, 0.2, 0.2, 0.2, 0.2, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.33, 0.33, 0.33, 0.33, 0.33, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.25, 0.25, 0.25, 0.25, 0.25, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24, 0.24, 0.24, 0.24, 0.24, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.32, 0.32, 0.32, 0.32, 0.32, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.33, 0.33, 0.33, 0.33, 0.33, 0.5, 0.5, 0.5, 0.5, 0.5, 0.45, 0.45, 0.45, 0.45, 0.45, 0.48, 0.48, 0.48, 0.48, 0.48, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715397860 --> 1715398488
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0]
                    
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants