Fix generation for bsz > 1 #1250

joecummings · 2024-08-02T02:53:01Z

Our modules only work with generation under two conditions: batch_size = 1 or every single sample in a batch has the same length. The main culprit is this line of code:

torchtune/torchtune/modules/transformer.py

Line 167 in 288ff44

self.causal_mask = torch.tril(

For a batch that looks like the following:

My, name, is, Joe
Hello, world, <PAD>, <PAD>
Bye, <PAD>, <PAD>, <PAD>

A proper mask would look like:

of size [b x s x s], which is [3 x 4 x 4]

This will be a fairly involved change that touches several utils and modules. The general changes needed will be:

Delete causal mask from KV cache, instead opting for this to come in the mask param
Update generate utils to pass a mask into its call to model.forward()
Modify eleuther eval recipe to construct proper causal mask to pass to

This was originally found and reported by @iankur

The text was updated successfully, but these errors were encountered:

SalmanMohammadi · 2024-08-02T10:37:00Z

Was this the kind of thing you had in mind? https://github.com/pytorch/torchtune/blob/1129f9e3a246628c991c246d81dbead62d3437a3/torchtune/modules/rlhf/_generation.py

Granted, there's a couple changes I've been meaning to make (only generating the full mask once, and extending it for each token in the batch, and you'll probably have a more intelligent way of generating the masks themselves).

joecummings · 2024-08-02T13:03:31Z

Was this the kind of thing you had in mind? 1129f9e/torchtune/modules/rlhf/_generation.py

Yep, this is pretty much it! I take it that you're not utilizing the KV Cache for this generation though, right?

SalmanMohammadi · 2024-08-02T13:17:09Z

Yep, this is pretty much it! I take it that you're not utilizing the KV Cache for this generation though, right?

Nah. It was also on my TODO list of possible optimizations, and I briefly spoke to Rafi about it, but we agreed it would be kind of a pain in the ass to setup cacheing for custom masks.

joecummings · 2024-08-21T23:33:33Z

Left padded:

My, name, is, Joe
<PAD>, <PAD> Hello, world 
<PAD>, <PAD>, <PAD>, Bye

Left padded mask:

SalmanMohammadi · 2024-08-22T11:22:40Z

Our modules only work with generation under two conditions: batch_size = 1 or every single sample in a batch has the same length. The main culprit is this line of code:

I assume batched generation in the eleuther eval recipe satisfies the latter?
I've just got iterative decoding + kv cacheing working for my batched RLHF generation utils - seeing > 10x speedups w/o compile (PPO go brrrr). Can chat about it later today if it's of interest.

joecummings mentioned this issue Aug 2, 2024

Eval harness batch decoding attention mask #1237

Closed

joecummings self-assigned this Aug 2, 2024

joecummings added the bug Something isn't working label Aug 2, 2024

SalmanMohammadi mentioned this issue Aug 2, 2024

RLHF with PPO #1005

Merged

19 tasks

joecummings mentioned this issue Aug 24, 2024

[WIP] Fix batched generation with KV Cache #1405

Closed

4 tasks

SalmanMohammadi mentioned this issue Aug 28, 2024

[RFC] Batched inference 🤝 KV-cache 🤝 compile #1424

Merged

13 tasks

SalmanMohammadi closed this as completed in #1424 Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix generation for bsz > 1 #1250

Fix generation for bsz > 1 #1250

joecummings commented Aug 2, 2024

SalmanMohammadi commented Aug 2, 2024

joecummings commented Aug 2, 2024

SalmanMohammadi commented Aug 2, 2024

joecummings commented Aug 21, 2024 •

edited

Loading

SalmanMohammadi commented Aug 22, 2024

Fix generation for bsz > 1 #1250

Fix generation for bsz > 1 #1250

Comments

joecummings commented Aug 2, 2024

SalmanMohammadi commented Aug 2, 2024

joecummings commented Aug 2, 2024

SalmanMohammadi commented Aug 2, 2024

joecummings commented Aug 21, 2024 • edited Loading

SalmanMohammadi commented Aug 22, 2024

joecummings commented Aug 21, 2024 •

edited

Loading