Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: mark a Func as no_profiling, to prevent injection of profiling. (2nd implementation) #8143

Merged
merged 5 commits into from
Mar 12, 2024

Conversation

mcourteaux
Copy link
Contributor

@mcourteaux mcourteaux commented Mar 7, 2024

Replaces #8136 with a cleaner implementation.

@abadams I took your suggestion, and this implementation is much better. Very small changes, and the issue I reported in the previous one is gone. The trick with the "environment" was a good lesson!

Additionally, I managed to also clean up the report by having the stack-allocated Allocate/Free nodes also have their allocation contribute to the enclosing Func. Doing that changed my profiling report from (w_0, w_1, w_2, max_channel, lum, lum_pow3, r, g, b all marked with no_profiling(), but still producing lines in the report):

neonraw_highlight_ratio_extractor
 total time: 1338.695068 ms  samples: 1262  runs: 48  time/run: 27.889481 ms
 average threads used: 2.657686
 heap allocations: 0  peak heap usage: 0 bytes
    halide_malloc:                0.000ms ( 0.0%)   threads: 0.000 
    halide_free:                  0.000ms ( 0.0%)   threads: 0.000 
    ratio_map:                   21.282ms (76.3%)   threads: 2.663 
    repeat_edge:                  2.764ms ( 9.9%)   threads: 3.248  stack: 3960
    max_field_hpass:              0.683ms ( 2.4%)   threads: 2.967  stack: 3840
    max_field:                    0.352ms ( 1.2%)   threads: 2.125  stack: 3072
    min_field_hpass:              0.507ms ( 1.8%)   threads: 2.826  stack: 1280
    max_channel:                  0.000ms ( 0.0%)   threads: 0.000  stack: 132
    min_field:                    0.088ms ( 0.3%)   threads: 1.250  stack: 1024
    b:                            0.000ms ( 0.0%)   threads: 0.000  stack: 64
    g:                            0.000ms ( 0.0%)   threads: 0.000  stack: 64
    r:                            0.000ms ( 0.0%)   threads: 0.000  stack: 64
    lum:                          0.000ms ( 0.0%)   threads: 0.000  stack: 64
    lum_pow3:                     0.000ms ( 0.0%)   threads: 0.000  stack: 64
    lum4:                         2.210ms ( 7.9%)   threads: 1.870  stack: 64
    w_0:                          0.000ms ( 0.0%)   threads: 0.000  stack: 64
    w_1:                          0.000ms ( 0.0%)   threads: 0.000  stack: 64
    w_2:                          0.000ms ( 0.0%)   threads: 0.000  stack: 64

to this (note how the stack size contributions (132 + 8 * 64 = 644) moved up into the enclosing Funcs:

  • min_field_hpass: 1412 - 1280 = 132 bytes extra,
  • ratio_map 384 - 0 = 384 bytes extra

The total difference is less than 644 bytes, because the lifetimes of some of the variables were actually fully-disjunct, causing them to not actually have a peak stack usage of that much.):

neonraw_highlight_ratio_extractor
 total time: 1524.809326 ms  samples: 1416  runs: 48  time/run: 31.766861 ms
 average threads used: 4.375706
 heap allocations: 0  peak heap usage: 0 bytes
    halide_malloc:                0.000ms ( 0.0%)   threads: 0.000 
    halide_free:                  0.000ms ( 0.0%)   threads: 0.000 
    ratio_map:                   24.556ms (77.3%)   threads: 4.067  stack: 384
    repeat_edge:                  3.719ms (11.7%)   threads: 5.716  stack: 3960
    max_field_hpass:              0.420ms ( 1.3%)   threads: 3.000  stack: 3840
    max_field:                    0.336ms ( 1.0%)   threads: 5.666  stack: 3072
    min_field_hpass:              0.575ms ( 1.8%)   threads: 3.846  stack: 1412
    min_field:                    0.088ms ( 0.2%)   threads: 8.500  stack: 1024
    lum4:                         2.069ms ( 6.5%)   threads: 5.813  stack: 64

@mcourteaux mcourteaux changed the title Small feature to allow you to specify that a (typically small inner loop) Func should not be profiled. Feature: mark a Func as no_profiling, to prevent injection of profiling. Mar 7, 2024
@mcourteaux mcourteaux changed the title Feature: mark a Func as no_profiling, to prevent injection of profiling. Feature: mark a Func as no_profiling, to prevent injection of profiling. (2nd implementation) Mar 7, 2024
}
}

const Function *lookup_function(const string &name) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function is already a pointer type, so this should probably just return a Function by value. Alternatively you could return a const Function &, but I'm not sure how to satisfy the compiler in the case where you need to return something valid after the internal_error. Maybe just return env.begin()->second?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first had a reference, but I in case the lookup fails, I need to satisfy the compiler and give it a nullptr. Idk if we can easily create a nullptr-Function& instead of a Function* being nullptr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest satisfying the compiler by returning the first Function in the environment: env.begin()->second.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a default constructor I see. So for this unreachable code, a simple return {}; looks a bit nicer I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that returning a reference to a local? Given that it's unreachable, whatever makes the compiler shut up is fine.

Copy link
Contributor Author

@mcourteaux mcourteaux Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that returning a reference to a local?

AFAICT, no. It's invoking the default constructor (the one literally defined with = default;), which just initialized the FunctionPtr inside to be default-constructed, meaning that there are just some pointers put to nullptr I assume.

@@ -713,6 +713,7 @@ table Func {
trace_stores: bool = false;
trace_realizations: bool = false;
trace_tags: [string];
no_profiling: bool = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to bump the serialization minor version number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually clueless where to find that number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused too. @TH3CHARLie ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it gets refactored earlier to here:

enum SerializationVersionMajor: int {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TH3CHARLie looking at those inline comments, suggest that I shouldn't touch anything, as we are still at "unstable"? Do I bump the patch version instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I believe so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abadams @TH3CHARLie It all looks suspicious, as the Patch version was at 0 (zero). Serialization version 18.0.0 seems suspicious, but maybe is what we are after? I just bumped it to 18.0.1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was at zero since we recently had a release and introduced this version numbering system and hasn't added anything new to the format, I don't understand what's suspicious here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Thanks!

@mcourteaux
Copy link
Contributor Author

AFAICT the failing build is unrelated, right? This might conflict with abadams/per_instance_profiling, in which case I'm happy to adapt this PR after we merge in Andrew's work.

@abadams
Copy link
Member

abadams commented Mar 12, 2024

Yes, the failure is unrelated (fixed by #8148)

@abadams abadams merged commit 4988ab5 into halide:main Mar 12, 2024
18 of 19 checks passed
@abadams abadams added the release_notes For changes that may warrant a note in README for official releases. label Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_notes For changes that may warrant a note in README for official releases.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants