GRU fix #250

Jerry-Master · 2023-08-29T12:40:29Z

Looking at your formulas in the article I see your implementation of the GRU does not coincide with the code you provide. I don't want you to merge this fork since it would break compatibility. But I leave it here in case you want to discuss the performance of this fixed ConvGRU implementation. It seems you are recycling the hidden states as if it was the forward activation. It is a valid approach, but I see more reasonable to separate between hidden state and forward activation.

PeterL1n · 2023-08-29T18:51:29Z

Unlike LSTM, GRU by design does not have separate hidden and forward output. They share the same. See this diagram.

The (1 - z) was opposite to the paper notation but they are equivalent.
So I believe my original implementation was correct.

Jerry-Master · 2023-08-29T19:11:42Z

I mean, you say in the article ot is the output of the layer and h is the hidden state. So it makes sense that you pass the output to the next layer and the hidden state to the next time step. I was wondering if you tried, or have some intuition on which option is better in performance because, computationally, they are very similar.

GRU fix

974f6fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRU fix #250

GRU fix #250

Jerry-Master commented Aug 29, 2023

PeterL1n commented Aug 29, 2023

Jerry-Master commented Aug 29, 2023

GRU fix #250

Are you sure you want to change the base?

GRU fix #250

Conversation

Jerry-Master commented Aug 29, 2023

PeterL1n commented Aug 29, 2023

Jerry-Master commented Aug 29, 2023