Add dynamic shape latency benchmarks #1098

shmsong · 2021-09-09T22:50:33Z

This PR just adds two benchmark graphs similar to the fused graphs on BERT model training.

Current benchmark result

Benchmark                                                                       Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------
ShapeInferenceBenchmark_FusedGraph0_1Segment                                 80.9 us         78.3 us         8277
ShapeInferenceBenchmark_FusedGraph0_1Segment_NoShapeInferenceBaseline        18.9 us         18.9 us        40090
ShapeInferenceBenchmark_FusedGraph1_2Segments                                 172 us          172 us         3709
ShapeInferenceBenchmark_FusedGraph1_2Segments_NoShapeInferenceBaseline       48.8 us         48.8 us        15285

The delta between the benchmark and NoShapeInferenceBaseline is the latency overhead of dynamic shapes.

Goal is to gradually get near this number meaningfully in a few PRs:

-----------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------
ShapeInferenceBenchmark_FusedGraph0_1Segment                                 16.5 us         16.5 us        42539
ShapeInferenceBenchmark_FusedGraph0_1Segment_NoShapeInferenceBaseline        9.56 us         9.56 us        69840
ShapeInferenceBenchmark_FusedGraph1_2Segments                                36.2 us         36.2 us        19080
ShapeInferenceBenchmark_FusedGraph1_2Segments_NoShapeInferenceBaseline       21.7 us         21.7 us        31567

csarofeen

Seems fine to me, are these DAGs based on something specific? Could we use something out of the operators interface we have instead of doing something quite so manual?

Maybe one of the backward tests from: #1073 could be used as a baseline for this test instead of specifying a long chain of operations.

Approving either way as there's nothing fundamentally wrong with the PR.

shmsong · 2021-09-13T23:58:45Z

Seems fine to me, are these DAGs based on something specific? Could we use something out of the operators interface we have instead of doing something quite so manual?

Maybe one of the backward tests from: #1073 could be used as a baseline for this test instead of specifying a long chain of operations.

Approving either way as there's nothing fundamentally wrong with the PR.

These DAGs are copied from fused graphs captured in BERT training and the slowest ones in dynamic shape are similar to these two.

There should be a way to construct these DAGs using the layer norm operator. I could try to simplify the definition and maybe also use #1073's case.

csarofeen · 2021-09-16T15:44:38Z

I think it's fine for now, merging. We can come back and try to clean it up later or find another use case.

shmsong added 2 commits September 9, 2021 15:26

add perf benchmark

2f13470

format

8faa7d7

shmsong changed the base branch from 20_12_3_devel to combine_reduction_fix September 9, 2021 22:51

naoyam approved these changes Sep 9, 2021

View reviewed changes

csarofeen approved these changes Sep 12, 2021

View reviewed changes

shmsong mentioned this pull request Sep 13, 2021

Dynamic shape latency improvement, step 1.5. Misc runtime allocation and cast cleanup #1114

Merged

csarofeen merged this pull request into combine_reduction_fix Sep 16, 2021

csarofeen deleted the shapeinference_benchmark branch January 22, 2022 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add dynamic shape latency benchmarks #1098

Add dynamic shape latency benchmarks #1098

Uh oh!

shmsong commented Sep 9, 2021 •

edited

Loading

Uh oh!

csarofeen left a comment

Uh oh!

shmsong commented Sep 13, 2021

Uh oh!

csarofeen commented Sep 16, 2021

Uh oh!

Uh oh!

Add dynamic shape latency benchmarks #1098

Add dynamic shape latency benchmarks #1098

Uh oh!

Conversation

shmsong commented Sep 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csarofeen left a comment

Choose a reason for hiding this comment

Uh oh!

shmsong commented Sep 13, 2021

Uh oh!

csarofeen commented Sep 16, 2021

Uh oh!

Uh oh!

shmsong commented Sep 9, 2021 •

edited

Loading