perplexity : support using multiple sequences to allow larger batch sizes #5946

slaren · 2024-03-08T23:48:44Z

Allows increasing the batch size with perplexity. The batch size must be a multiple of n_ctx.

There is a small improvement to the performance since the batching API allows extracting only the logits that are actually used, which reduces the amount of data that needs to be copied back from the GPU, and increasing the batch size can help with quantized models when using a small context, but mainly the goal is to allow using larger batch sizes with pipeline parallelism when using multiple GPUs.

…izes ggml-ci

examples/perplexity/perplexity.cpp

sorasoras · 2024-03-09T16:29:46Z

This in theory should applied to imatrix as well

slaren · 2024-03-09T18:50:37Z

Probably won't help with imatrix unless using very small context sizes. As it is, imatrix will also not benefit from pipeline parallelism because reading the activations forces a synchronization.

…izes (ggerganov#5946) * perplexity : support using multiple sequences to allow larger batch sizes ggml-ci * set cparams.n_parallel to the number of sequences * print tested n_ctx, add assert

perplexity : support using multiple sequences to allow larger batch s…

7d99955

…izes ggml-ci

slaren force-pushed the sl/ppl-batching branch from babfe9e to 7d99955 Compare March 9, 2024 00:10

compilade reviewed Mar 9, 2024

View reviewed changes

examples/perplexity/perplexity.cpp Outdated Show resolved Hide resolved

set cparams.n_parallel to the number of sequences

ac07f7d

compilade reviewed Mar 9, 2024

View reviewed changes

examples/perplexity/perplexity.cpp Outdated Show resolved Hide resolved

examples/perplexity/perplexity.cpp Outdated Show resolved Hide resolved

ggerganov approved these changes Mar 9, 2024

View reviewed changes

print tested n_ctx, add assert

23dbcfa

slaren merged commit d894f35 into master Mar 9, 2024
56 of 61 checks passed

slaren deleted the sl/ppl-batching branch March 9, 2024 18:55

compilade mentioned this pull request Sep 10, 2024

imatrix : use GGUF to store importance matrices #9400

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perplexity : support using multiple sequences to allow larger batch sizes #5946

perplexity : support using multiple sequences to allow larger batch sizes #5946

slaren commented Mar 8, 2024 •

edited

Loading

sorasoras commented Mar 9, 2024

slaren commented Mar 9, 2024

perplexity : support using multiple sequences to allow larger batch sizes #5946

perplexity : support using multiple sequences to allow larger batch sizes #5946

Conversation

slaren commented Mar 8, 2024 • edited Loading

sorasoras commented Mar 9, 2024

slaren commented Mar 9, 2024

slaren commented Mar 8, 2024 •

edited

Loading