Skip to content

Conversation

shmsong
Copy link

@shmsong shmsong commented Sep 9, 2021

This PR just adds two benchmark graphs similar to the fused graphs on BERT model training.

Current benchmark result

Benchmark                                                                       Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------
ShapeInferenceBenchmark_FusedGraph0_1Segment                                 80.9 us         78.3 us         8277
ShapeInferenceBenchmark_FusedGraph0_1Segment_NoShapeInferenceBaseline        18.9 us         18.9 us        40090
ShapeInferenceBenchmark_FusedGraph1_2Segments                                 172 us          172 us         3709
ShapeInferenceBenchmark_FusedGraph1_2Segments_NoShapeInferenceBaseline       48.8 us         48.8 us        15285

The delta between the benchmark and NoShapeInferenceBaseline is the latency overhead of dynamic shapes.

Goal is to gradually get near this number meaningfully in a few PRs:

-----------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------
ShapeInferenceBenchmark_FusedGraph0_1Segment                                 16.5 us         16.5 us        42539
ShapeInferenceBenchmark_FusedGraph0_1Segment_NoShapeInferenceBaseline        9.56 us         9.56 us        69840
ShapeInferenceBenchmark_FusedGraph1_2Segments                                36.2 us         36.2 us        19080
ShapeInferenceBenchmark_FusedGraph1_2Segments_NoShapeInferenceBaseline       21.7 us         21.7 us        31567

@shmsong shmsong changed the base branch from 20_12_3_devel to combine_reduction_fix September 9, 2021 22:51
Copy link
Owner

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, are these DAGs based on something specific? Could we use something out of the operators interface we have instead of doing something quite so manual?

Maybe one of the backward tests from: #1073 could be used as a baseline for this test instead of specifying a long chain of operations.

Approving either way as there's nothing fundamentally wrong with the PR.

@shmsong
Copy link
Author

shmsong commented Sep 13, 2021

Seems fine to me, are these DAGs based on something specific? Could we use something out of the operators interface we have instead of doing something quite so manual?

Maybe one of the backward tests from: #1073 could be used as a baseline for this test instead of specifying a long chain of operations.

Approving either way as there's nothing fundamentally wrong with the PR.

These DAGs are copied from fused graphs captured in BERT training and the slowest ones in dynamic shape are similar to these two.

There should be a way to construct these DAGs using the layer norm operator. I could try to simplify the definition and maybe also use #1073's case.

@csarofeen
Copy link
Owner

I think it's fine for now, merging. We can come back and try to clean it up later or find another use case.

@csarofeen csarofeen merged this pull request into combine_reduction_fix Sep 16, 2021
@csarofeen csarofeen deleted the shapeinference_benchmark branch January 22, 2022 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants