[GPU][NFC] Improve error messages. #18948

sergachev · 2024-11-01T12:04:08Z

No description provided.

bchetioui · 2024-11-05T09:05:53Z

xla/service/gpu/autotuning/gemm_fusion_autotuner.cc

@@ -1039,7 +1039,7 @@ GemmFusionAutotunerImpl::CompileAll(AutotunerCompileUtil& compile_util,
          absl::StatusOr<bool> has_executable =
              compile(fusion, config, gemm_config_set.size() > 1);
          TF_CHECK_OK(has_executable.status())
-              << "Failure occured when compiling fusion " << fusion->name()
+              << " - Failure occured when compiling fusion " << fusion->name()


Curious why this is better?

Currently it prints without a space, something like "Unknown instructionFailure occured when ...".

Imported from GitHub PR #18948 Copybara import of the project: -- 80e717c by Ilia Sergachev <isergachev@nvidia.com>: [GPU][NFC] Improve error messages. Merging this change closes #18948 FUTURE_COPYBARA_INTEGRATE_REVIEW=#18948 from openxla:improve_error_messages 80e717c PiperOrigin-RevId: 693279083

Imported from GitHub PR openxla/xla#18948 Copybara import of the project: -- 80e717c39e8a120cca974dca9f473d817d3a3457 by Ilia Sergachev <isergachev@nvidia.com>: [GPU][NFC] Improve error messages. Merging this change closes #18948 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#18948 from openxla:improve_error_messages 80e717c39e8a120cca974dca9f473d817d3a3457 PiperOrigin-RevId: 693279083

…eadable. This CL moves the code that measures the performance of a candidate into a separate function. This makes the code more readable and easier to follow. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#18948 from openxla:improve_error_messages 80e717c39e8a120cca974dca9f473d817d3a3457 PiperOrigin-RevId: 692981497

…s much as possible. This is particularly useful in FSDP/HSDP where gradient propagation can be done fully in the i+1th iteration. It takes the responsibility of the user to set the `xla_gpu_all_reduce_combine_threshold_bytes` by themselves. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#18948 from openxla:improve_error_messages 80e717c39e8a120cca974dca9f473d817d3a3457 PiperOrigin-RevId: 689310865

Imported from GitHub PR openxla/xla#18948 Copybara import of the project: -- 80e717c39e8a120cca974dca9f473d817d3a3457 by Ilia Sergachev <isergachev@nvidia.com>: [GPU][NFC] Improve error messages. Merging this change closes #18948 PiperOrigin-RevId: 693291127

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#18948 from openxla:improve_error_messages 80e717c39e8a120cca974dca9f473d817d3a3457 PiperOrigin-RevId: 693242323

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#18948 from openxla:improve_error_messages 80e717c39e8a120cca974dca9f473d817d3a3457 PiperOrigin-RevId: 693295239

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#18948 from openxla:improve_error_messages 80e717c39e8a120cca974dca9f473d817d3a3457 PiperOrigin-RevId: 692583711

sergachev requested a review from loislo November 1, 2024 12:04

[GPU][NFC] Improve error messages.

80e717c

sergachev requested a review from bchetioui November 4, 2024 10:28

bchetioui reviewed Nov 5, 2024

View reviewed changes

bchetioui approved these changes Nov 5, 2024

View reviewed changes

copybara-service bot mentioned this pull request Nov 5, 2024

PR #18948: [GPU][NFC] Improve error messages. #19053

Merged

copybara-service bot mentioned this pull request Nov 5, 2024

PR #18948: [GPU][NFC] Improve error messages. tensorflow/tensorflow#79423

Merged

copybara-service bot closed this in cab79f8 Nov 5, 2024

copybara-service bot mentioned this pull request Nov 5, 2024

[XLA:GPU] Refactor GemmFusionAutotunerImpl::Profile to make it more readable. tensorflow/tensorflow#79346

Merged

copybara-service bot mentioned this pull request Nov 5, 2024

[XLA:GPU] Extend AllReduceCombiner to combine pipelined collectives as much as possible. tensorflow/tensorflow#79204

Merged

copybara-service bot mentioned this pull request Nov 5, 2024

Automated Code Change tensorflow/tensorflow#79406

Merged

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 5, 2024

Reverts 9b0c336

9791455

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#18948 from openxla:improve_error_messages 80e717c39e8a120cca974dca9f473d817d3a3457 PiperOrigin-RevId: 693295239

copybara-service bot mentioned this pull request Nov 5, 2024

Reverts 9b0c336f7bd6fca7c3368879dbeb4524b68d65f6 tensorflow/tensorflow#79427

Merged

copybara-service bot mentioned this pull request Nov 5, 2024

Automated Code Change tensorflow/tensorflow#79419

Merged

sergachev deleted the improve_error_messages branch November 6, 2024 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU][NFC] Improve error messages. #18948

[GPU][NFC] Improve error messages. #18948

sergachev commented Nov 1, 2024

bchetioui Nov 5, 2024

sergachev Nov 5, 2024

bchetioui Nov 5, 2024

[GPU][NFC] Improve error messages. #18948

[GPU][NFC] Improve error messages. #18948

Conversation

sergachev commented Nov 1, 2024

bchetioui Nov 5, 2024

Choose a reason for hiding this comment

sergachev Nov 5, 2024

Choose a reason for hiding this comment

bchetioui Nov 5, 2024

Choose a reason for hiding this comment