CPU gradient definitions #69

xkr1 · 2020-05-23T13:18:55Z

Hi,

Firstly, thanks so much for making this code available. It's a really useful resource for learning more about RNN-T.

I have two questions, if I may:

i.) in cpu_rnnt.h the gradients are defined as:

ProbT g = alphas[idx(t, u)] + betas[idx(t+1, u)];
grad[idx(t, u, blank_)] = -std::exp(log_probs[idx(t, u) * 2] + g - loglike);

and

ProbT g = alphas[idx(t, u)] + betas[idx(t, u+1)];
grad[idx(t, u, labels[u])] = -std::exp(log_probs[idx(t, u) * 2 + 1] + g - loglike);

Following rnnt_notes.pdf, as far as I understand the formula used here is equation 9. From this, I can account for the following part of the above code:

ProbT g = alphas[idx(t, u)] + betas[idx(t+1, u)];
grad[idx(t, u, blank_)] = -std::exp(g - loglike);

but I'm not sure where the log_probs[] comes in here?

ii.) Following on from my first question, it might be obvious that I'm not an expert in calculus and so I was wondering if you'd be able to point me to any resources or give any hints so that I can learn more myself about how to derive the gradient of the loss from eq. 5, 7 and 8 in rnnt_loss.pdf? Even the smallest hint would be greatly appreciated!

Thanks again!

The text was updated successfully, but these errors were encountered:

HawkAaron · 2020-05-24T12:36:54Z

Well, the gradidents here are to log probability, \frac{\partial L}{\partial \log p} = \frac{\partial L}{\partial p} \frac{\partial \exp{\log p}}{\partial \log p} = p \frac{\partial L}{\partial p}.

xkr1 · 2020-05-24T13:07:12Z

Thank you! That makes a lot more sense to me now :-)

Referring now to the component g - loglike (in log domain): I understand how g is derived but it's so far been a mystery to me why the -loglike is there.

Would I be right in saying that this is because the loss, L, is defined in log domain, i.e. L = -lnP(y*|x) and so we end up dividing through by this P(y*|x) term (loglike) as a result of applying the chain rule to take the derivative of the log?

q121q · 2020-05-30T03:11:23Z

how is this different than CTC loss?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU gradient definitions #69

CPU gradient definitions #69

xkr1 commented May 23, 2020

HawkAaron commented May 24, 2020

xkr1 commented May 24, 2020

q121q commented May 30, 2020

CPU gradient definitions #69

CPU gradient definitions #69

Comments

xkr1 commented May 23, 2020

HawkAaron commented May 24, 2020

xkr1 commented May 24, 2020

q121q commented May 30, 2020