We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A couple of models only run on the default precisions, and we would like to to enable bf16 precisions on them.
The reason models are failing because their inputs either can't be directly cast to bf16, or not completely being cast to bf16.
Per @drisspg:
Using this script: from transformer_nuggets.utils.shape_trace import ShapeLog import torch from pathlib import Path from tqdm import tqdm import logging import json logging.basicConfig(level=logging.INFO) def main(): import torchbenchmark.models as models models = [] success_count = 0 failure_count = 0 model_failures = {} for file in Path("torchbenchmark/models/").iterdir(): if file.is_dir(): models.append(file.name) for model_name in tqdm(models, desc="Logging models", aunit="model"): try: module = __import__(f"torchbenchmark.models.{model_name}", fromlist=[model_name]) model, example_inputs = module.Model(test="train", device="cuda", extra_args=["--precision=bf16",]).get_module() model(*example_inputs) success_count += 1 except Exception as e: tqdm.write(f"Failed to log {module}: {e}") failure_count += 1 model_failures[model_name] = str(e) tqdm.write(f"Successfully logged {success_count} models") tqdm.write(f"Failed to log {failure_count} models") with open("model_failures_bf16.txt", "w") as f: json.dump(model_failures, f) if __name__ == "__main__": main() returns the following model failures; internalfb.com/intern/paste/P1197341526
Using this script:
from transformer_nuggets.utils.shape_trace import ShapeLog import torch from pathlib import Path from tqdm import tqdm import logging import json logging.basicConfig(level=logging.INFO) def main(): import torchbenchmark.models as models models = [] success_count = 0 failure_count = 0 model_failures = {} for file in Path("torchbenchmark/models/").iterdir(): if file.is_dir(): models.append(file.name) for model_name in tqdm(models, desc="Logging models", aunit="model"): try: module = __import__(f"torchbenchmark.models.{model_name}", fromlist=[model_name]) model, example_inputs = module.Model(test="train", device="cuda", extra_args=["--precision=bf16",]).get_module() model(*example_inputs) success_count += 1 except Exception as e: tqdm.write(f"Failed to log {module}: {e}") failure_count += 1 model_failures[model_name] = str(e) tqdm.write(f"Successfully logged {success_count} models") tqdm.write(f"Failed to log {failure_count} models") with open("model_failures_bf16.txt", "w") as f: json.dump(model_failures, f) if __name__ == "__main__": main()
returns the following model failures;
internalfb.com/intern/paste/P1197341526
The text was updated successfully, but these errors were encountered:
We also want to validate amp_bf16 on CPU.
Sorry, something went wrong.
We are migrating to pt2 benchmark runner, so we do not plan to support bf16 in the legacy runner.
No branches or pull requests
Uh oh!
There was an error while loading. Please reload this page.
A couple of models only run on the default precisions, and we would like to to enable bf16 precisions on them.
The reason models are failing because their inputs either can't be directly cast to bf16, or not completely being cast to bf16.
Per @drisspg:
The text was updated successfully, but these errors were encountered: