[Relax] Batch norm correctness on eval mode #17752

hugolatendresse · 2025-03-16T05:54:06Z

Batch_norm is a different operator in training and eval. The previous interface defaulted to the training mode and required changing an ingested pytorch program itself to use the eval mode. This is sub-ideal, especially since torch.export explicitely communicates whether batch_norm should be in training or eval in a given torch program.

This PR automates the selection of training/eval mode in the exported program translator, and achieves correctness for eval mode.

Future TODO: there is something wrong with batch_norm on training mode. It does not pass a correctness test when taken straight from the main branch (there's an issue with tensor dimensions). I added a note to address later as training mode is probably not high priority.

…ials

hugolatendresse · 2025-03-24T04:30:12Z

cc: @MasterJH5574 this is ready for review

hugolatendresse · 2025-03-24T18:11:06Z

@tvm-bot rerun

MasterJH5574

Looks good. Thank you @hugolatendresse for the enhancement!

MasterJH5574 · 2025-03-26T00:56:21Z

python/tvm/relax/frontend/torch/exported_program_translator.py

    ########## Neural Network ##########

-    def _batch_norm_legit_no_training(self, node: fx.Node) -> relax.Var:
+    def _batch_norm(self, node: fx.Node, training) -> relax.Var:


Good to add a type annotation in any of followup PRs.

Suggested change

def _batch_norm(self, node: fx.Node, training) -> relax.Var:

def _batch_norm(self, node: fx.Node, training: bool) -> relax.Var:

Got it, will do, thanks

Batch_norm is a different operator in training and eval. The previous interface defaulted to the training mode and required changing an ingested pytorch program itself to use the eval mode. This is sub-ideal, especially since torch.export explicitely communicates whether batch_norm should be in training or eval in a given torch program. This PR automates the selection of training/eval mode in the exported program translator, and achieves correctness for eval mode. Future TODO: there is something wrong with batch_norm on training mode. It does not pass a correctness test when taken straight from the main branch (there's an issue with tensor dimensions). I added a note to address later as training mode is probably not high priority.

hugolatendresse added 18 commits March 10, 2025 13:22

trying to understand why batchnorm returns all zeros

8b5a81d

debugging training vs non-training batch norm

99373ae

merge main

4f93317

added training in attrs

b0e1154

training False

dde7872

training argument in nn.py

dff60db

little cleanup before building

f1986d9

fix copy-paste errors

1545b99

builds, but should probably just update nn.h instead

77cc1d8

batch_norm build

0dbf8fe

first batchnorm test passes with .eval(), but not without, and copy f…

1164d21

…ials

copy failing

a72ce6e

todo

42728f7

cleanup

9ee0672

training failing

e3f0236

no need to pass center and scale since default ok

3f68087

cleanup

5cd314d

cleanup

d5d30b7

hugolatendresse changed the title ~~[Relax] Fix batch norm ingestion~~ [Relax] Batch norm correctness on eval mode Mar 16, 2025

hugolatendresse added 11 commits March 16, 2025 13:17

reformat

125a9a6

Merge branch 'main' into batch_norm

281fb53

batch norm default and print torch version

3f0eaea

whitespace

79e3ec6

remove dummy test

79c4a0e

Merge branch 'main' of https://github.com/apache/tvm into batch_norm

2dc643e

getting a tuple as output of batchnorm

b9697f3

output now of the right dimension, and close! but is not exactly equal

bc18182

still not the same with 2 1 2 2

e2e7263

missing eps

4cdb05a

last small test passes, but most tests still fail

b256163

hugolatendresse added 20 commits March 21, 2025 15:59

need to fix test_batch_norm7

536310a

commented out tests that pass

4c55f20

legalize tests

e99d659

correct calc of data for everyone

56b3999

track running stats is equivalent to training! passes all

fc6b03a

all tests pass except for cache size

7139590

all batch norm only pass!

267f011

all exported tests work, moved to main script

7a5cadd

need to fix legalize tests

e76ab8c

Merge branch 'main' into batch_norm

7c99174

first legalize test passes

54f00f1

all legalize pass

d74cfbf

all tests pass

a9de5ef

linting

6458a64

cleanup

cd8fa7b

cleanup batchnorm

73ef53e

linting

1578ae2

smaller third test

95254bc

formatting

6984609

renaming

8233013

hugolatendresse marked this pull request as ready for review March 24, 2025 04:29

resolve conflicts with main

8c5cfc7

MasterJH5574 approved these changes Mar 26, 2025

View reviewed changes

MasterJH5574 reviewed Mar 26, 2025

View reviewed changes

MasterJH5574 merged commit 51d4b6b into apache:main Mar 26, 2025
10 checks passed

mshr-h mentioned this pull request Apr 15, 2025

[Bug] AssertionError: Unsupported function type batch_norm.default #17735

Closed

ysh329 mentioned this pull request Apr 19, 2025

[Release] v0.20.0 Release Candidate Notes #17860

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Relax] Batch norm correctness on eval mode #17752

[Relax] Batch norm correctness on eval mode #17752

Uh oh!

hugolatendresse commented Mar 16, 2025 •

edited

Loading

Uh oh!

hugolatendresse commented Mar 24, 2025

Uh oh!

hugolatendresse commented Mar 24, 2025

Uh oh!

MasterJH5574 left a comment

Uh oh!

MasterJH5574 Mar 26, 2025

Uh oh!

hugolatendresse Mar 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def _batch_norm(self, node: fx.Node, training) -> relax.Var:
	def _batch_norm(self, node: fx.Node, training: bool) -> relax.Var:

[Relax] Batch norm correctness on eval mode #17752

[Relax] Batch norm correctness on eval mode #17752

Uh oh!

Conversation

hugolatendresse commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hugolatendresse commented Mar 24, 2025

Uh oh!

hugolatendresse commented Mar 24, 2025

Uh oh!

MasterJH5574 left a comment

Choose a reason for hiding this comment

Uh oh!

MasterJH5574 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

hugolatendresse Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hugolatendresse commented Mar 16, 2025 •

edited

Loading