Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with GRU simulation #20535

Open
kokhazade opened this issue Nov 22, 2024 · 2 comments
Open

Problem with GRU simulation #20535

kokhazade opened this issue Nov 22, 2024 · 2 comments
Assignees
Labels
type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.

Comments

@kokhazade
Copy link

Hello,

I am going to implement the GRU computations in my code and compare the results with the output of a GRU layer. To this end, I am using a same input for both cases and get the weights of GRU and pass it to my_GRU_simulation function. By using the following code for inner gates calculations, I get two different outputs for these two cases.
Is there any extra calculations (rather than following) that I have missed?
How can I create the exactly same output as a GRU layer?

        import tensorflow as tf
        from keras import backend

        GRU_layer_input = tf.squeeze(GRU_layer_input, axis=1)
        matrix_x = backend.dot(tf.convert_to_tensor(GRU_layer_input), tf.convert_to_tensor(gru_layer_weights[0]))
        matrix_x = backend.bias_add(matrix_x, gru_layer_weights[2][0])
        x_z, x_r, x_h = tf.split(matrix_x, 3, axis=1)

        matrix_y = backend.dot(tf.convert_to_tensor(previous_gru_output), tf.convert_to_tensor(gru_layer_weights[1]))
        matrix_y = backend.bias_add(matrix_y, gru_layer_weights[2][1])
        recurrent_z, recurrent_r, recurrent_h = tf.split(matrix_y, 3, axis=1)

        z = tf.sigmoid(x_z + recurrent_z)
        r = tf.sigmoid(x_r + recurrent_r)
        hh = tf.tanh(x_h + r * recurrent_h)

        h = z * h_tm1 + (1 - z) * hh
        new_state = [h] if tf.nest.is_nested(previous_gru_output) else h

previous_gru_output is all zero with GRU units size.
I am using tensorflow and keras version 2.11.0

@kokhazade
Copy link
Author

Hi @sachinprasadhs,
Can you help me with finding the difference source?

@sachinprasadhs sachinprasadhs added type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. keras-team-review-pending Pending review by a Keras team member. labels Nov 27, 2024
@mattdangerw mattdangerw removed the keras-team-review-pending Pending review by a Keras team member. label Nov 27, 2024
@mattdangerw
Copy link
Member

mattdangerw commented Nov 27, 2024

Just by eye, your implementation looks like it could be correct given certain settings to the GRU layer.

Probably easiest to actually go op by op in the call graph to debug where things get off. Keras 2.11 is quite old at this point (many years old), but you can find the implementation on the 2.11 tag on the Keras repo on github. Here's the GRU code in 2.11 (thought there's other paths for other GRU layer init args).

if 0.0 < self.dropout < 1.0:
inputs = inputs * dp_mask[0]
# inputs projected by all gate matrices at once
matrix_x = backend.dot(inputs, self.kernel)
if self.use_bias:
# biases: bias_z_i, bias_r_i, bias_h_i
matrix_x = backend.bias_add(matrix_x, input_bias)
x_z, x_r, x_h = tf.split(matrix_x, 3, axis=-1)
if self.reset_after:
# hidden state projected by all gate matrices at once
matrix_inner = backend.dot(h_tm1, self.recurrent_kernel)
if self.use_bias:
matrix_inner = backend.bias_add(
matrix_inner, recurrent_bias
)
else:
# hidden state projected separately for update/reset and new
matrix_inner = backend.dot(
h_tm1, self.recurrent_kernel[:, : 2 * self.units]
)
recurrent_z, recurrent_r, recurrent_h = tf.split(
matrix_inner, [self.units, self.units, -1], axis=-1
)
z = self.recurrent_activation(x_z + recurrent_z)
r = self.recurrent_activation(x_r + recurrent_r)
if self.reset_after:
recurrent_h = r * recurrent_h
else:
recurrent_h = backend.dot(
r * h_tm1, self.recurrent_kernel[:, 2 * self.units :]
)
hh = self.activation(x_h + recurrent_h)
# previous and candidate state mixed by update gate
h = z * h_tm1 + (1 - z) * hh
new_state = [h] if tf.nest.is_nested(states) else h
return h, new_state

If I were trying to track this down, I'd give myself an isolated example and get an environment where I could hack up the layer code in GRU. Maybe start be running eagerly, and add a bunch of prints for intermediate computation values, see if that's helps triangulate, go from there.

@mattdangerw mattdangerw self-assigned this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.
Projects
None yet
Development

No branches or pull requests

3 participants