Skip to content

Conversation

@bringlein
Copy link
Contributor

For our application, we wanted to investigate when, how often, and for how long the triton autotuner is triggered (to asses the impact on the latency of the application).
Therefore, we implemented a simple report flag to the autotuner decorator:

@triton.autotune(
    configs=[...],  
    key=[...],
    report=True,  # new, optional
)
@triton.jit
def fused_add_rmsnorm_triton(...

If this flag is set to true, the autotuner will print the following statement, every time a new autotune run is triggered:

Autotuner for function JITFunction(__main__:fused_add_rmsnorm_triton) finished after 4.15s; best config selected: BLOCK_N_SIZE: 512, num_warps: 8, num_ctas: 1, num_stages: 3;

There are no prints if a cached configuration is used.

(We thought this could be also a helpful feature for others, therefore we created this PR directly and we can make/discuss requested changes here. However, if you think this should be discussed instead in an issue, let us know and sorry).

@bringlein bringlein requested a review from ptillet as a code owner March 19, 2024 13:42
@Jokeren
Copy link
Contributor

Jokeren commented Mar 19, 2024

I'm OK with the change. Should it be a log or just print? Triton doesn't have a logging system though

cc @jlebar @ThomasRaoux

Copy link
Contributor

@jlebar jlebar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea!

:type warmup: int
:param rep: Repetition time (in ms) to pass to benchmarking, defaults to 100.
:type rep: int
:param report: Flag to enable printing the selected configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the flag to print? "report" can be a noun or a verb, and that ambiguity is confusing here. (Do I pass the report in as a parameter? What is the report? Or is the autotuning report passed as an outparameter somehow?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, changed it to print_autotune_stats

:type warmup: int
:param rep: Repetition time (in ms) to pass to benchmarking, defaults to 100.
:type rep: int
:param report: Flag to enable printing the selected configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Reword to something like what you wrote in the commit message, which is more helpful. For example: "If print is true, Triton will print a log message each time it autotunes a function."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I updated it.

@ThomasRaoux
Copy link
Collaborator

I wonder why this can't be done by using a profiler?

@Jokeren
Copy link
Contributor

Jokeren commented Mar 19, 2024

I wonder why this can't be done by using a profiler?

Indeed this is a case I used the profiler a lot...as proton can rename kernels based on constants. The problem is probably because proton is not available yet...

@jlebar
Copy link
Contributor

jlebar commented Mar 19, 2024

Personally I think something lightweight like this is nice to have even if we have heavier-weight like Proton. It's basically zero complexity overhead and can be really useful for quick-and-dirty debugging.

@Jokeren
Copy link
Contributor

Jokeren commented Mar 19, 2024

Personally I think something lightweight like this is nice to have even if we have heavier-weight like Proton. It's basically zero complexity overhead and can be really useful for quick-and-dirty debugging.

I think it also depends on whether you want to check tuning time on CPU + GPU or just GPU time. This PR seems to get end to end tuning time.

if self.report and not used_cached_result:
autotune_stop = time.time()
print(
f"Autotuner for function {self.fn} finished after {autotune_stop-autotune_start:.2f}s; best config selected: {self.best_config};"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use self.bench_time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, the hook won't make such a difference. I changed it

@bringlein bringlein force-pushed the ngl_pr_autotuner_report branch 2 times, most recently from 4731aad to 7d8b4b6 Compare March 20, 2024 09:20
@bringlein
Copy link
Contributor Author

Thanks for your helpful comments!

I think it also depends on whether you want to check tuning time on CPU + GPU or just GPU time. This PR seems to get end to end tuning time.

Exactly, we were/are interested in the end-to-end time of the autotuner, so that we could tell easily if a variance in latency of our application was caused by the triton autotuner or smth else.

@bringlein
Copy link
Contributor Author

Should it be a log or just print? Triton doesn't have a logging system though

I was also thinking if besides True/False maybe a streaming object could be passed to this argument, so to route where the log message should be printed to. But I guess this would assume more about an existing/coming logging infrastructure than just a simple print.

@Jokeren
Copy link
Contributor

Jokeren commented Mar 20, 2024

Thanks for your helpful comments!

I think it also depends on whether you want to check tuning time on CPU + GPU or just GPU time. This PR seems to get end to end tuning time.

Exactly, we were/are interested in the end-to-end time of the autotuner, so that we could tell easily if a variance in latency of our application was caused by the triton autotuner or smth else.

Hi @ThomasRaoux , since they are interested in end-to-end statistics. It might be fine to print some debugging information? What's your thought?

@ptillet
Copy link
Collaborator

ptillet commented Mar 24, 2024

Yeah I think this can be helpful to some people and really doesn't add much complexity since best_config is already here

@ThomasRaoux
Copy link
Collaborator

Having debug logs makes sense, do we want this to be a front end option or an env variable? In general debug features are controlled by env variables.

@jlebar
Copy link
Contributor

jlebar commented Mar 25, 2024

I'm fine with either an envvar or the in-code flag. It sound like we'd have consensus if we went with the env var? I propose TRITON_PRINT_AUTOTUNING as a strawperson.

@bringlein
Copy link
Contributor Author

I agree, controlling the logging via an environment variable serves the purpose better.
I changed the implementation to check for TRITON_PRINT_AUTOTUNING=1.

@bringlein bringlein force-pushed the ngl_pr_autotuner_report branch from 2918810 to 927f09f Compare March 28, 2024 09:42
Copy link
Contributor

@jlebar jlebar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jlebar
Copy link
Contributor

jlebar commented Apr 1, 2024

I can merge this once we make the final few remaining changes!

@jlebar jlebar force-pushed the ngl_pr_autotuner_report branch from 927f09f to 23308be Compare April 2, 2024 21:31
@jlebar jlebar enabled auto-merge (squash) April 2, 2024 21:33
@jlebar
Copy link
Contributor

jlebar commented Apr 2, 2024

Rebased and am trying to merge this.

@jlebar jlebar merged commit feb13ca into triton-lang:main Apr 2, 2024
@bringlein
Copy link
Contributor Author

Thanks @jlebar! I was on a trip and didn't had the time to fix/react to your comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants