You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @iankur - good question! We integrate with Eleuther for our evaluation, therefore we have to fit certain API contracts. Eleuther has a pretty strong integration with Hugging Face transformers and therefore most of their APIs are fit to the params that their models expect.
In this case, Hugging Face models and therefore Eleuther expect an attention mask. However, since we actually handle calling the model forward, we don't need this mask. It definitely would be best practices though :)
I could not find where do we mask the left padding done (here) in case of batch decoding. If we are masking then could you point where it is being done? If we are indeed not masking, then shouldn't we? Also, I see lm eval numbers improve by 0.5-1% on old open llm leaderboard tasks if I change batch size from 4 to 1.
@iankur On a closer look at our casual generation mask, I see that it is indeed broken for bsz > 1. I will work on fixing this, but as it's not a straightforward change, I've called this out in our eleuther script and filed an issue #1250
Should we be returning proper attention mask here which will be required for batch decoding?
The text was updated successfully, but these errors were encountered: