Skip to content

Commit

Permalink
Server: add tests for batch size, different seeds
Browse files Browse the repository at this point in the history
  • Loading branch information
JohannesGaessler committed Apr 30, 2024
1 parent 4dba7e8 commit 9ff8d4d
Show file tree
Hide file tree
Showing 2 changed files with 156 additions and 80 deletions.
88 changes: 56 additions & 32 deletions examples/server/tests/features/results.feature
Original file line number Diff line number Diff line change
Expand Up @@ -7,44 +7,16 @@ Feature: Results
And a model file tinyllamas/split/stories15M-00001-of-00003.gguf from HF repo ggml-org/models
And a model file test-model-00001-of-00003.gguf
And 128 as batch size
And 256 KV cache size
And 1024 KV cache size
And 128 max tokens to predict
And continuous batching

Scenario Outline: Multi users completion
Scenario Outline: consistent results with same seed
Given <n_slots> slots
And continuous batching
Then the server is starting
Then the server is healthy

Given 42 as seed
And a prompt:
"""
Write a very long story about AI.
"""

Given 42 as seed
And a prompt:
"""
Write a very long story about AI.
"""

Given 42 as seed
And a prompt:
"""
Write a very long story about AI.
"""

Given 42 as seed
And a prompt:
"""
Write a very long story about AI.
"""

Given 42 as seed
And a prompt:
"""
Write a very long story about AI.
"""
Given 4 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 42

Given concurrent completion requests
Then the server is busy
Expand All @@ -55,3 +27,55 @@ Feature: Results
| n_slots |
| 1 |
| 2 |

Scenario Outline: different results with different seed
Given <n_slots> slots
Then the server is starting
Then the server is healthy

Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 42
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 43
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 44
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 45

Given concurrent completion requests
Then the server is busy
Then the server is idle
And all slots are idle
Then all predictions are different
Examples:
| n_slots |
| 1 |
| 2 |

Scenario Outline: consistent results with same seed and varying batch size
Given 4 slots
And <temp> temperature
# And 0 as draft
Then the server is starting
Then the server is healthy

Given 1 prompts "Write a very long story about AI." with seed 42
And concurrent completion requests
# Then the server is busy # Not all slots will be utilized.
Then the server is idle
And all slots are idle

Given <n_parallel> prompts "Write a very long story about AI." with seed 42
And concurrent completion requests
# Then the server is busy # Not all slots will be utilized.
Then the server is idle
And all slots are idle

Then all predictions are equal
Examples:
| n_parallel | temp |
| 1 | 0.0 |
| 2 | 0.0 |
| 4 | 0.0 |
| 1 | 1.0 |
# FIXME: These tests fail on master. The problem seems to be the unified KV cache.
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574 .
# | 2 | 1.0 |
# | 4 | 1.0 |
Loading

0 comments on commit 9ff8d4d

Please sign in to comment.