Problem with GRU simulation #20535

kokhazade · 2024-11-22T13:04:45Z

Hello,

I am going to implement the GRU computations in my code and compare the results with the output of a GRU layer. To this end, I am using a same input for both cases and get the weights of GRU and pass it to my_GRU_simulation function. By using the following code for inner gates calculations, I get two different outputs for these two cases.
Is there any extra calculations (rather than following) that I have missed?
How can I create the exactly same output as a GRU layer?

        import tensorflow as tf
        from keras import backend

        GRU_layer_input = tf.squeeze(GRU_layer_input, axis=1)
        matrix_x = backend.dot(tf.convert_to_tensor(GRU_layer_input), tf.convert_to_tensor(gru_layer_weights[0]))
        matrix_x = backend.bias_add(matrix_x, gru_layer_weights[2][0])
        x_z, x_r, x_h = tf.split(matrix_x, 3, axis=1)

        matrix_y = backend.dot(tf.convert_to_tensor(previous_gru_output), tf.convert_to_tensor(gru_layer_weights[1]))
        matrix_y = backend.bias_add(matrix_y, gru_layer_weights[2][1])
        recurrent_z, recurrent_r, recurrent_h = tf.split(matrix_y, 3, axis=1)

        z = tf.sigmoid(x_z + recurrent_z)
        r = tf.sigmoid(x_r + recurrent_r)
        hh = tf.tanh(x_h + r * recurrent_h)

        h = z * h_tm1 + (1 - z) * hh
        new_state = [h] if tf.nest.is_nested(previous_gru_output) else h

previous_gru_output is all zero with GRU units size.
I am using tensorflow and keras version 2.11.0

The text was updated successfully, but these errors were encountered:

kokhazade · 2024-11-25T10:26:01Z

Hi @sachinprasadhs,
Can you help me with finding the difference source?

mattdangerw · 2024-11-27T21:13:16Z

Just by eye, your implementation looks like it could be correct given certain settings to the GRU layer.

Probably easiest to actually go op by op in the call graph to debug where things get off. Keras 2.11 is quite old at this point (many years old), but you can find the implementation on the 2.11 tag on the Keras repo on github. Here's the GRU code in 2.11 (thought there's other paths for other GRU layer init args).

keras/keras/layers/rnn/gru.py

Lines 308 to 350 in e6784e4

    
               if 0.0 < self.dropout < 1.0: 
        
                   inputs = inputs * dp_mask[0] 
        
               # inputs projected by all gate matrices at once 
        
               matrix_x = backend.dot(inputs, self.kernel) 
        
               if self.use_bias: 
        
                   # biases: bias_z_i, bias_r_i, bias_h_i 
        
                   matrix_x = backend.bias_add(matrix_x, input_bias) 
        
               x_z, x_r, x_h = tf.split(matrix_x, 3, axis=-1) 
        
               if self.reset_after: 
        
                   # hidden state projected by all gate matrices at once 
        
                   matrix_inner = backend.dot(h_tm1, self.recurrent_kernel) 
        
                   if self.use_bias: 
        
                       matrix_inner = backend.bias_add( 
        
                           matrix_inner, recurrent_bias 
        
                       ) 
        
               else: 
        
                   # hidden state projected separately for update/reset and new 
        
                   matrix_inner = backend.dot( 
        
                       h_tm1, self.recurrent_kernel[:, : 2 * self.units] 
        
                   ) 
        
               recurrent_z, recurrent_r, recurrent_h = tf.split( 
        
                   matrix_inner, [self.units, self.units, -1], axis=-1 
        
               ) 
        
               z = self.recurrent_activation(x_z + recurrent_z) 
        
               r = self.recurrent_activation(x_r + recurrent_r) 
        
               if self.reset_after: 
        
                   recurrent_h = r * recurrent_h 
        
               else: 
        
                   recurrent_h = backend.dot( 
        
                       r * h_tm1, self.recurrent_kernel[:, 2 * self.units :] 
        
                   ) 
        
               hh = self.activation(x_h + recurrent_h) 
        
           # previous and candidate state mixed by update gate 
        
           h = z * h_tm1 + (1 - z) * hh 
        
           new_state = [h] if tf.nest.is_nested(states) else h 
        
           return h, new_state

If I were trying to track this down, I'd give myself an isolated example and get an environment where I could hack up the layer code in GRU. Maybe start be running eagerly, and add a bunch of prints for intermediate computation values, see if that's helps triangulate, go from there.

github-actions bot assigned sachinprasadhs Nov 22, 2024

sachinprasadhs added type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. keras-team-review-pending Pending review by a Keras team member. labels Nov 27, 2024

mattdangerw removed the keras-team-review-pending Pending review by a Keras team member. label Nov 27, 2024

mattdangerw self-assigned this Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with GRU simulation #20535

Problem with GRU simulation #20535

kokhazade commented Nov 22, 2024

kokhazade commented Nov 25, 2024

mattdangerw commented Nov 27, 2024 •

edited

Loading

Problem with GRU simulation #20535

Problem with GRU simulation #20535

Comments

kokhazade commented Nov 22, 2024

kokhazade commented Nov 25, 2024

mattdangerw commented Nov 27, 2024 • edited Loading

mattdangerw commented Nov 27, 2024 •

edited

Loading