Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Add features to beam search that are supported in other libraries #5205

Closed
danieldeutsch opened this issue May 17, 2021 · 4 comments · Fixed by #5216
Closed

Add features to beam search that are supported in other libraries #5205

danieldeutsch opened this issue May 17, 2021 · 4 comments · Fixed by #5216
Assignees

Comments

@danieldeutsch
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Other libraries, like transformers and fairseq, have implemented several beam search options that aren't included in AllenNLP, which would be useful to have. For instance:

  1. Minimum-length requirements
  2. Scoring the final sequences by the average log probability per token instead of the sum of the log probs.
  3. Normalizing the final scores by some length penalty
  4. Blocking repeated n-grams from the output

All 4 of these are used in the BART paper, but the generic AllenNLP beam search code does not support them. I have not run the exact training config for the CNN/DailyMail BART model, but I have run one almost identical to it, and I could not reproduce the BART paper's results without implementing 1-3 myself. 4 is more complicated and I haven't implemented it yet.

Describe the solution you'd like
I think there are workarounds for all 4 problems that could be implemented in the model code, but they all seem generically useful enough to include them in the beam search code.

If you would like, I can create a separate issue for each of the above requests (although I think 2 and 3 should be solved together) and submit my own PRs.

@danieldeutsch
Copy link
Contributor Author

Looks like #5113 is asking for 3 as well.

@epwalsh
Copy link
Member

epwalsh commented May 17, 2021

These would be all be great! Looks like you've got 1-3 covered (I'll finish reviewing shortly), and I think 4 could be implemented as a Sampler.

@danieldeutsch
Copy link
Contributor Author

I do think 4 could be a Sampler, but blocking repeated ngrams and picking which sampling technique you use seem like they should be orthogonal decisions to me. I was imagining it could be implemented as an abstract Constrant class which would be able to zero-out predictions on each step. Each constraint could have its own state, which, in this case, would keep track of which ngrams have already appeared. It might not be necessary to make it an abstract class since I can't immediately think of other popularly enforced constraints that would fit this interface (although min_steps could also be implemented this way).

@epwalsh
Copy link
Member

epwalsh commented May 17, 2021

Actually I really like this idea of having a Constraint abstract class. This would allow people to use our BeamSearch for constrained generation tasks like semantic parsing, where the generated tokens have to form valid SQL or whatever.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants