You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
, the way of creating H, the number of sentence heads of the encoder, is to add a linear + norm layer to transform the input of the CLS token from (batch_size, nsents, 1, model_d) into (batch_size, nsents, sentence_output_nheads, new_model_d). I wonder why do we need this extra layer, instead of feeding the original input (batch_size, nsents, 1, model_d) into the quantization layer?
I feel that you probably did experiments with it and chose this design and want to hear more. Thanks for your response in advance.
The text was updated successfully, but these errors were encountered:
From the code
qt/src/encoders.py
Line 37 in c136ac0
I feel that you probably did experiments with it and chose this design and want to hear more. Thanks for your response in advance.
The text was updated successfully, but these errors were encountered: