about transformer position_wise_feed_forward #98

Continue7777 · 2018-12-01T10:24:30Z

Recently,i do some experient about bert and transformer on text_classification.I find position always consists of two linear transformations with a ReLU activation in between.But you use conv?Do you have something special thought about this change.

text_classification/a07_Transformer/a2_poistion_wise_feed_forward.py

Lines 35 to 58 in 3e7911b

    
               def position_wise_feed_forward_fn(self): 
        
                   """ 
        
                   x:       [batch,sequence_length,d_model] 
        
                   :return: [batch,sequence_length,d_model] 
        
                   """ 
        
                   output=None 
        
                   #1.conv1 
        
                   input=tf.expand_dims(self.x,axis=3) #[batch,sequence_length,d_model,1] 
        
                   # conv2d.input:       [None,sentence_length,embed_size,1]. filter=[filter_size,self.embed_size,1,self.num_filters] 
        
                   # output with padding:[None,sentence_length,1,1] 
        
                   output_conv1=tf.layers.conv2d( 
        
                       input,filters=self.d_ff,kernel_size=[1,self.d_model],padding="VALID", 
        
                       name='conv1',kernel_initializer=self.initializer,activation=tf.nn.relu 
        
                   ) 
        
                   output_conv1 = tf.transpose(output_conv1, [0,1,3,2]) 
        
                   print("output_conv1:",output_conv1) 
        
                   #2.conv2 
        
                   output_conv2 = tf.layers.conv2d( 
        
                       output_conv1,filters=self.d_model,kernel_size=[1,self.d_ff],padding="VALID", 
        
                       name='conv2',kernel_initializer=self.initializer,activation=None 
        
                   ) 
        
                   output=tf.squeeze(output_conv2) #[batch,sequence_length,d_model] 
        
                   return output #[batch,sequence_length,d_model]

brightmart · 2018-12-01T13:19:40Z

hi, I do saw above setting from BERT. I use conv follow transfomer's implementation( tensor2tensor). we think it may have less parameters.

Continue7777 closed this as completed Dec 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about transformer position_wise_feed_forward #98

about transformer position_wise_feed_forward #98

Continue7777 commented Dec 1, 2018

brightmart commented Dec 1, 2018

about transformer position_wise_feed_forward #98

about transformer position_wise_feed_forward #98

Comments

Continue7777 commented Dec 1, 2018

brightmart commented Dec 1, 2018