Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documented CUDA reproducibility, added warning #1346

Merged
merged 1 commit into from
May 8, 2023

Conversation

JohannesGaessler
Copy link
Collaborator

@JohannesGaessler JohannesGaessler commented May 6, 2023

Fixes #1340 .

When I worked on CUDA acceleration I noticed that the text I generated was not the same as master despite setting the same seed. It seems that due to the use of multiple CUDA streams, the reproducibility of cuBLAS matrix multiplication is not guaranteed . This does not seem to affect perplexity scores but it can unnecessarily cost developers time when they search for non-existent bugs.

This PR documents the non-reproducible nature of cuBLAS. It adds warnings to the README and ggml-cuda.cu. Additionally, when using cuBLAS and setting a seed a warning is printed which states that reproducibility is not guaranteed.

@JohannesGaessler JohannesGaessler force-pushed the cuda-stream-documentation branch from 6f121b1 to 5326ec3 Compare May 6, 2023 19:14
@slaren
Copy link
Member

slaren commented May 6, 2023

You missed ggml_cuda_mul_mat_f16, but I would prefer if the comment in these functions was removed, there are enough warnings already.

@JohannesGaessler JohannesGaessler force-pushed the cuda-stream-documentation branch from 5326ec3 to 4b18cdf Compare May 6, 2023 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generation with cuBLAS not deterministic for long prompts
2 participants