How important is it to use sentence_output_nheads? #3

hjian42 · 2022-07-14T18:12:00Z

From the code

Line 37 in c136ac0

if sentence_output_nheads > 1:

, the way of creating H, the number of sentence heads of the encoder, is to add a linear + norm layer to transform the input of the CLS token from (batch_size, nsents, 1, model_d) into (batch_size, nsents, sentence_output_nheads, new_model_d). I wonder why do we need this extra layer, instead of feeding the original input (batch_size, nsents, 1, model_d) into the quantization layer?

I feel that you probably did experiments with it and chose this design and want to hear more. Thanks for your response in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How important is it to use sentence_output_nheads? #3

How important is it to use sentence_output_nheads? #3

hjian42 commented Jul 14, 2022 •

edited

Loading

How important is it to use sentence_output_nheads? #3

How important is it to use sentence_output_nheads? #3

Comments

hjian42 commented Jul 14, 2022 • edited Loading

hjian42 commented Jul 14, 2022 •

edited

Loading