Random seed possible problems. #8593

0wwafa · 2024-07-19T19:01:20Z

I ran llama.cpp (latest version) with these parameters:

prompt="""
Tell me a long story.
"""

llama-cli --seed 1721414715 -c 4096 -m /content/$m -t $(nproc) -ngl 999 -p "User: Hi\nBot:Hi\nUser: {prompt}\nBot:"

and in the log I read the seed was: 1721414715

so at the next run I used --seed 1721414715 but the story was a different one.

why?

The text was updated successfully, but these errors were encountered:

0wwafa · 2024-07-19T19:07:08Z

the second time I ran llama.cpp with the same seed it told me the same story.

so I don't understand why, when I did not specify the seed, the log shown the seed main: seed = 1721414715

and when I entered it manually instead told me a different story,

then run again with the same seed manually, it told the same story.

I see 2 possibilities:

when not specified, the seed is shown "wrong"
when entered manually the seed is interpreted differently.

Rotatingxenomorph · 2024-07-19T20:23:52Z

The CUDA version introduces some randomness even with the same seed.

0wwafa · 2024-07-19T21:15:24Z

The CUDA version introduces some randomness even with the same seed.

I am using CPU ONLY.

Rotatingxenomorph · 2024-07-19T21:18:42Z

The CUDA version introduces some randomness even with the same seed.

I am using CPU ONLY.

Why the -ngl 999 then?

compilade · 2024-07-19T21:21:10Z

I see 2 possibilities:

when not specified, the seed is shown "wrong"

when entered manually the seed is interpreted differently.

This is weird because both of these possibilities don't seem to be what's happening, which means it might be hard to debug.

llama.cpp/examples/main/main.cpp

Lines 188 to 194 in 87e397d

    
           if (params.seed == LLAMA_DEFAULT_SEED) { 
        
               params.seed = time(NULL); 
        
           } 
        
           LOG_TEE("%s: seed  = %u\n", __func__, params.seed); 
        
           std::mt19937 rng(params.seed);

then run again with the same seed manually, it told the same story.

This rules out non-determinism of the backend.

EDIT: I can also reproduce this problem on my machine (with CPU-only inference). It's a very weird behavior.

compilade · 2024-07-19T21:42:23Z

AHA! The sampling seed in params.sparams.seed is set by --seed, but not when choosing a default seed in main.cpp.

This seems to fix it:

diff --git a/examples/main/main.cpp b/examples/main/main.cpp
index a0d817b1..ceed4ce5 100644
--- a/examples/main/main.cpp
+++ b/examples/main/main.cpp
@@ -187,6 +187,7 @@ int main(int argc, char ** argv) {
 
     if (params.seed == LLAMA_DEFAULT_SEED) {
         params.seed = time(NULL);
+        sparams.seed = params.seed;
     }
 
     LOG_TEE("%s: seed  = %u\n", __func__, params.seed);

I see 2 possibilities:

when not specified, the seed is shown "wrong"

when entered manually the seed is interpreted differently.

It seems like BOTH of theses guesses were true after all.

JohannesGaessler · 2024-07-20T06:20:57Z

The CUDA version introduces some randomness even with the same seed.

The CUDA backend is deterministic as in the results for the same input parameters will have the same output logits. However, if you use >1 slots or prompt caching on the server then the input parameters can vary and thus the outputs will vary too.

Rotatingxenomorph · 2024-07-20T08:21:20Z

The CUDA backend is deterministic as in the results for the same input parameters will have the same output logits. However, if you use >1 slots or prompt caching on the server then the input parameters can vary and thus the outputs will vary too.

That's good to learn! Thank you.

0wwafa · 2024-07-20T17:52:51Z

@compilade

It seems like BOTH of theses guesses were true after all.
:D so what was the seed when not specified? 0?

compilade · 2024-07-21T07:11:19Z

so what was the seed when not specified? 0?

When not specified, the sampling seed is random.

llama.cpp/common/sampling.cpp

Line 82 in 22f281a

seed = std::random_device{}();

0wwafa · 2024-07-21T07:55:21Z

so what was the seed when not specified? 0?

When not specified, the sampling seed is random.

llama.cpp/common/sampling.cpp

Line 82 in 22f281a

seed = std::random_device{}();

@compilade so.. I don't understand: what was happening before? why the seed printed when it was random didn't work?

AHA! The sampling seed in params.sparams.seed is set by --seed, but not when choosing a default seed in main.cpp.

so why did it work the second time? luck?

SharifIsmail · 2024-07-24T16:04:47Z

@JohannesGaessler

The CUDA backend is deterministic as in the results for the same input parameters will have the same output logits. However, if you use >1 slots or prompt caching on the server then the input parameters can vary and thus the outputs will vary too

I tried to figure out why using >1 slot does not produce deterministic results when doing parallel requests. Do you know why it is not possible to get deterministic output when making parallel requests?

JohannesGaessler · 2024-07-24T16:09:58Z

Because floating point arithmetic is not commutative. You only get bit-for-bit identical results if you do the exact same operations in the exact same order. But the whole reason why >1 slots is faster is that you do not do that but instead change the kernels depending on how many slots are currently in use. Also the positions of individual sequences within the unified KV cache will be different.

compilade · 2024-07-24T16:16:53Z

I tried to figure out why using >1 slot does not produce deterministic results when doing parallel requests. Do you know why it is not possible to get deterministic output when making parallel requests?

See also ggerganov/whisper.cpp#1941 (comment).

But when the order is exactly the same, the output between runs can still be exactly the same, even with parallel sequences, as I've seen in #6122 (comment).

SharifIsmail · 2024-07-24T16:27:43Z

I see. Thanks @compilade @JohannesGaessler

So, running higher-precision models with a higher-precision KV cache would alleviate this effect, right?

JohannesGaessler · 2024-07-24T16:31:29Z

No, even with 16 bit precision you will still run into this issue because the condition numbers of the weight matrices can be arbitrarily large.

SharifIsmail · 2024-07-24T18:54:13Z

I did some quick tests for the sake of curiosity with "Phi-3-mini-4k-instruct-fp16.gguf" vs "Phi-3-mini-4k-instruct-q4.gguf".

Bottom Line: As you stated, JohannesGaessler, both are nondeterministic for the vast majority of cases. Even with cherry-picked settings attempting to minimize non-determinism (i.e., "-b 1 -ub 1 -nocb" with cache_prompt=false), I only managed to get a few prompts on the fp16 model to return deterministic output. I used "-np 10", i.e. 10 slots and 10 parallel requests.

yaleeyang · 2024-09-04T05:56:38Z

The CUDA version introduces some randomness even with the same seed.

The CUDA backend is deterministic as in the results for the same input parameters will have the same output logits. However, if you use >1 slots or prompt caching on the server then the input parameters can vary and thus the outputs will vary too.

Hey Johannes, is there any test cases for CUDA bit-exact determinism for the project?

JohannesGaessler · 2024-09-04T08:14:33Z

There are multiple in the server tests. But they're commented out since they're failing on master.

github-actions · 2024-10-19T01:07:19Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

ngxson mentioned this issue Jul 21, 2024

Bug: The output content is different #8585

Closed

github-actions bot added the stale label Aug 24, 2024

github-actions bot removed the stale label Sep 5, 2024

d-kleine mentioned this issue Oct 3, 2024

Make llama.cpp's cache_prompt parameter configurable ollama/ollama#5760

Open

github-actions bot added the stale label Oct 5, 2024

github-actions bot closed this as completed Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random seed possible problems. #8593

Random seed possible problems. #8593

0wwafa commented Jul 19, 2024

0wwafa commented Jul 19, 2024

Rotatingxenomorph commented Jul 19, 2024

0wwafa commented Jul 19, 2024

Rotatingxenomorph commented Jul 19, 2024

compilade commented Jul 19, 2024 •

edited

Loading

compilade commented Jul 19, 2024 •

edited

Loading

JohannesGaessler commented Jul 20, 2024

Rotatingxenomorph commented Jul 20, 2024

0wwafa commented Jul 20, 2024

compilade commented Jul 21, 2024

0wwafa commented Jul 21, 2024

SharifIsmail commented Jul 24, 2024 •

edited

Loading

JohannesGaessler commented Jul 24, 2024

compilade commented Jul 24, 2024 •

edited

Loading

SharifIsmail commented Jul 24, 2024

JohannesGaessler commented Jul 24, 2024

SharifIsmail commented Jul 24, 2024

yaleeyang commented Sep 4, 2024

JohannesGaessler commented Sep 4, 2024

github-actions bot commented Oct 19, 2024

Random seed possible problems. #8593

Random seed possible problems. #8593

Comments

0wwafa commented Jul 19, 2024

0wwafa commented Jul 19, 2024

Rotatingxenomorph commented Jul 19, 2024

0wwafa commented Jul 19, 2024

Rotatingxenomorph commented Jul 19, 2024

compilade commented Jul 19, 2024 • edited Loading

compilade commented Jul 19, 2024 • edited Loading

JohannesGaessler commented Jul 20, 2024

Rotatingxenomorph commented Jul 20, 2024

0wwafa commented Jul 20, 2024

compilade commented Jul 21, 2024

0wwafa commented Jul 21, 2024

SharifIsmail commented Jul 24, 2024 • edited Loading

JohannesGaessler commented Jul 24, 2024

compilade commented Jul 24, 2024 • edited Loading

SharifIsmail commented Jul 24, 2024

JohannesGaessler commented Jul 24, 2024

SharifIsmail commented Jul 24, 2024

yaleeyang commented Sep 4, 2024

JohannesGaessler commented Sep 4, 2024

github-actions bot commented Oct 19, 2024

compilade commented Jul 19, 2024 •

edited

Loading

compilade commented Jul 19, 2024 •

edited

Loading

SharifIsmail commented Jul 24, 2024 •

edited

Loading

compilade commented Jul 24, 2024 •

edited

Loading