Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Autodiff] Deterministic gradient compute #7321

Merged
merged 11 commits into from
Jan 28, 2021
Merged

Conversation

hzfan
Copy link
Contributor

@hzfan hzfan commented Jan 21, 2021

te.gradient may generate different (but equivalent) backward compute for a single forward, which may result in Ansor miss. This pr makes sure te.gradient always generates the same backward.

cc @yzhliu @comaniac

Comment on lines 353 to 358
dag = tvm.auto_scheduler.ComputeDAG(grads)
repeat = 100
for i in range(repeat):
grads = te.gradient(R, [X], head=ones)
new_dag = tvm.auto_scheduler.ComputeDAG(grads)
assert str(dag) == str(new_dag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Since auto_scheduler guarantees the DAG would be the same with the same given compute, you don't need to involve auto_scheduler in this test.
  2. I'm even not sure if we need this test because it seems cannot expose the real problem. IIUC, the non-deterministic behavior comes from the use of unordered_set, so you may still pass this pass when you're lucky even you break something. If that happens, this test becomes flaky. But I'd like to hear opinions from others. cc @yzhliu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since auto_scheduler guarantees the DAG would be the same with the same given compute, you don't need to involve auto_scheduler in this test.

Yes, I agree. I use auto_scheduler here only because it provides a hash_key for TE level ir. Any ideas about how to compare two Tensor?

I'm even not sure if we need this test because it seems cannot expose the real problem. IIUC, the non-deterministic behavior comes from the use of unordered_set, so you may still pass this pass when you're lucky even you break something. If that happens, this test becomes flaky. But I'd like to hear opinions from others.

Agree. I'm fine with removing the test.

@comaniac comaniac changed the title [TE] stable gradient [Autodiff] Deterministic gradient compute Jan 25, 2021
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM, but I'm not that familiar with this module, so I'd let @yzhliu approve this PR.

Copy link
Member

@yzhliu yzhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls retrigger the ci, seems to be a flaky.

@yzhliu yzhliu merged commit 00257f3 into apache:main Jan 28, 2021
@yzhliu
Copy link
Member

yzhliu commented Jan 28, 2021

Thanks @hzfan @comaniac

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 11, 2021
* fix unstable compute

* fix

* fix

* lint

* sort linear equation

* sort inequalities

* fix

* fix find

* lint

* fix find

* lint
electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021
* fix unstable compute

* fix

* fix

* lint

* sort linear equation

* sort inequalities

* fix

* fix find

* lint

* fix find

* lint
Lokiiiiii pushed a commit to Lokiiiiii/tvm that referenced this pull request Mar 2, 2021
* fix unstable compute

* fix

* fix

* lint

* sort linear equation

* sort inequalities

* fix

* fix find

* lint

* fix find

* lint
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2021
* fix unstable compute

* fix

* fix

* lint

* sort linear equation

* sort inequalities

* fix

* fix find

* lint

* fix find

* lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants