[tfjs-core] fix gather gradient when batchDims is 1 #7942

paradite · 2023-09-05T11:22:21Z

This PR fixes #7494, where error is encountered when computing gradient of tf.gather with batchDims=1.

I am opening a PR with a naive and unoptimized fix first to see if the CI passes, and gather feedback on the direction of the fix.

If the CI passes, I will work on optimizing the logic if needed.

I am not really familiar with the maths behind the original implementation in derX of gatherGradConfig, and I couldn't get it to work within the derX function, I am open to suggestions on better and more optimized way to fix the issue.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

mattsoulanille · 2023-09-11T18:42:32Z

/gcbrun

mattsoulanille · 2023-09-11T19:37:15Z

Thanks for the contribution! It looks like you're splitting the batched tensor, applying the gradient function to each batch, and then stacking them together again. This seems reasonable to me.

The original math for derX is also not really clear to me. For y = tf.gather(x, indices, axis, batchDim), I'd expect dy/dx to just copy the selected values specified by the indices variable from the dy variable and be zero everywhere else. However, it seems to be using unsortedSegmentSum. I think this is to account for repeated indices. There's also some complicated transpose logic to make the input work for unsortedSegmentSum. It might be possible to get the indices of these two ops to work correctly for arbitrary batch dimensions, but I'm not sure. I'm fine with your approach, and we can revisit it if we need better performance in the future.

There's probably a way to apply your approach for generic batchDims, but I'm not going to block this PR because of that.

Also, it looks like CI is failing due to code linting. I'll push a patch to fix that.

mattsoulanille · 2023-09-11T19:41:00Z

/gcbrun

mattsoulanille

LGTM

paradite · 2023-09-12T07:15:56Z

Thank you @mattsoulanille for the help and review. I will take note of the lint issue for future PRs.

Hi @pyu10055, can I get your blessing and help to merge this?

paradite added 5 commits September 5, 2023 13:51

feat: reproduce error

bdb091c

feat: working code

448ad38

feat: remove logging

2a19faa

feat: minor fix

c6459d9

feat: refactor derX function

bc7c88f

paradite changed the title ~~[tfjs-core][draft] fix gather gradient when batchDims is 1~~ [tfjs-core] fix gather gradient when batchDims is 1 Sep 5, 2023

Merge branch 'master' into fix/gather

e4724e3

paradite mentioned this pull request Sep 11, 2023

tf.gather causes reshape error during gradient computation #7494

Closed

gaikwadrahul8 requested review from pyu10055 and mattsoulanille September 11, 2023 17:11

Fix lint

c55570f

mattsoulanille approved these changes Sep 11, 2023

View reviewed changes

pyu10055 approved these changes Sep 12, 2023

View reviewed changes

pyu10055 merged commit f44e224 into tensorflow:master Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tfjs-core] fix gather gradient when batchDims is 1 #7942

[tfjs-core] fix gather gradient when batchDims is 1 #7942

paradite commented Sep 5, 2023

mattsoulanille commented Sep 11, 2023

mattsoulanille commented Sep 11, 2023

mattsoulanille commented Sep 11, 2023

mattsoulanille left a comment

paradite commented Sep 12, 2023

[tfjs-core] fix gather gradient when batchDims is 1 #7942

[tfjs-core] fix gather gradient when batchDims is 1 #7942

Conversation

paradite commented Sep 5, 2023

mattsoulanille commented Sep 11, 2023

mattsoulanille commented Sep 11, 2023

mattsoulanille commented Sep 11, 2023

mattsoulanille left a comment

Choose a reason for hiding this comment

paradite commented Sep 12, 2023