Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tutorials] Add tutorial on JIT compile/execute performance #7838

Merged
merged 5 commits into from
Sep 15, 2023

Conversation

derek-gerstmann
Copy link
Contributor

Compares performance of realize(), compile_jit(), compile_callable(), compile_module() and shows benefits of JIT cache.

tutorial/lesson_22_jit_performance.cpp Show resolved Hide resolved
tutorial/lesson_22_jit_performance.cpp Outdated Show resolved Hide resolved
tutorial/lesson_22_jit_performance.cpp Outdated Show resolved Hide resolved
tutorial/lesson_22_jit_performance.cpp Show resolved Hide resolved
@abadams
Copy link
Member

abadams commented Sep 7, 2023

I think the cases to test here are the ones that match real usage patterns:

  1. Defining and compiling the whole pipeline every time you want to run it (i.e. in the benchmarking loop)
  2. Defining the pipeline outside the benchmarking loop, and realizing it repeatedly.
  3. (optional) Same as 2), but calling compile_jit() outside the loop, saying what it does, and saying why the time isn't actually different to case 2 (benchmark() runs multiple times and takes a min, and realize only compiles on the first run)
  4. Compiling to a callable outside the benchmarking loop and showing that it has lower overhead than case 3 (if indeed it does. If not we may need to change the example so that it does, e.g. by adding a real input buffer.)

Another subtlety we could consider is the difference between allocating the output buffer outside the benchmarking loop vs using the form of realize that allocates an output for you, e.g. realize({1024, 1024});

Add timing estimates as comments.
Add std::function example.
Enable advanced scheduling directives.
@derek-gerstmann
Copy link
Contributor Author

Ah, yes ... great suggestions! I'll swap out the contrived examples and add the above use cases you listed.

Added cases that match real usage patterns:

1. Defining and compiling the whole pipeline every time you want to run it (i.e. in the benchmarking loop)
2. Defining the pipeline outside the benchmarking loop, and realizing it repeatedly.
3. (optional) Same as 2), but calling compile_jit() outside the loop, saying what it does, and saying why the time isn't actually different to case 2 (benchmark() runs multiple times and takes a min, and realize only compiiles on the first run)
4. Compiling to a callable outside the benchmarking loop and showing that it has lower overhead than case 3 (if indeed it does. If not we may need to change the example so that it does, e.g. by adding a real input buffer.)
@derek-gerstmann
Copy link
Contributor Author

Updated with new cases as suggested.

Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More minor nits but otherwise this is great, LGTM

tutorial/lesson_22_jit_performance.cpp Outdated Show resolved Hide resolved
tutorial/lesson_22_jit_performance.cpp Outdated Show resolved Hide resolved
tutorial/lesson_22_jit_performance.cpp Outdated Show resolved Hide resolved
tutorial/lesson_22_jit_performance.cpp Outdated Show resolved Hide resolved
// calling convention.
auto arguments = pipeline.infer_arguments();

// The Callable object acts as a convienient way of invoking the compiled code like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convenient

(side note: it's 2023, why don't we have smart spellcheckers that are useful for typos in code comments?)

@steven-johnson steven-johnson merged commit d7760f5 into main Sep 15, 2023
19 checks passed
@steven-johnson steven-johnson deleted the dg/jit_perf_tutorial branch September 15, 2023 01:05
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
)

* Add tutorial on JIT compile/execute performance

* Addressing comments from review. Fix punctuation and comment nits.
Add timing estimates as comments.
Add std::function example.
Enable advanced scheduling directives.

* Addressing comments from review.

Added cases that match real usage patterns:

1. Defining and compiling the whole pipeline every time you want to run it (i.e. in the benchmarking loop)
2. Defining the pipeline outside the benchmarking loop, and realizing it repeatedly.
3. (optional) Same as 2), but calling compile_jit() outside the loop, saying what it does, and saying why the time isn't actually different to case 2 (benchmark() runs multiple times and takes a min, and realize only compiiles on the first run)
4. Compiling to a callable outside the benchmarking loop and showing that it has lower overhead than case 3 (if indeed it does. If not we may need to change the example so that it does, e.g. by adding a real input buffer.)

* Addressing comments from review for style nits, and typos in comments.

---------

Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com>
Co-authored-by: Steven Johnson <srj@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants