[BART] Do not add start/end tokens multiple times #3714

emilydinan · 2021-06-14T15:14:30Z

Patch description
Found an issue with BART in which -- if we cache the text_vec -- start/end tokens are added multiple times. I was using this caching behavior to do "scoring" of candidates with BART, when we had too many candidates to fit in memory.

I adopted the same solution we use for BERT:

ParlAI/parlai/agents/bert_ranker/bi_encoder_ranker.py

Line 147 in c4c2669

def _set_text_vec(self, *args, **kwargs):

CC @adamlerer

Testing steps
I added a test.

klshuster

seems harmless enough

Emily Dinan added 2 commits June 14, 2021 10:59

bart don't add extra start end tokens

1d087e1

add a test

24b3a40

facebook-github-bot added the CLA Signed label Jun 14, 2021

emilydinan requested review from klshuster and stephenroller June 14, 2021 15:14

klshuster approved these changes Jun 14, 2021

View reviewed changes

emilydinan merged commit 9f9121d into master Jun 15, 2021

emilydinan deleted the bartstart branch June 15, 2021 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BART] Do not add start/end tokens multiple times #3714

[BART] Do not add start/end tokens multiple times #3714

emilydinan commented Jun 14, 2021

klshuster left a comment

[BART] Do not add start/end tokens multiple times #3714

[BART] Do not add start/end tokens multiple times #3714

Conversation

emilydinan commented Jun 14, 2021

klshuster left a comment

Choose a reason for hiding this comment