Validate the bf16 precisions across models #2203

xuzhao9 · 2024-03-19T01:11:44Z

A couple of models only run on the default precisions, and we would like to to enable bf16 precisions on them.

The reason models are failing because their inputs either can't be directly cast to bf16, or not completely being cast to bf16.

Per @drisspg:

Using this script:

from transformer_nuggets.utils.shape_trace import ShapeLog
import torch
from pathlib import Path
from tqdm import tqdm
import logging
import json

logging.basicConfig(level=logging.INFO)

def main():
    import torchbenchmark.models as models
    models = []
    success_count = 0
    failure_count = 0
    model_failures = {}
    for file in Path("torchbenchmark/models/").iterdir():
        if file.is_dir():
            models.append(file.name)
    for model_name in tqdm(models, desc="Logging models", aunit="model"):
        try:
            module = __import__(f"torchbenchmark.models.{model_name}", fromlist=[model_name])
            model, example_inputs = module.Model(test="train", device="cuda", extra_args=["--precision=bf16",]).get_module()
            model(*example_inputs)
            success_count += 1
        except Exception as e:
            tqdm.write(f"Failed to log {module}: {e}")
            failure_count += 1
            model_failures[model_name] = str(e)

    tqdm.write(f"Successfully logged {success_count} models")
    tqdm.write(f"Failed to log {failure_count} models")
    with open("model_failures_bf16.txt", "w") as f:
        json.dump(model_failures, f)



if __name__ == "__main__":
    main()

returns the following model failures;

internalfb.com/intern/paste/P1197341526

xuzhao9 · 2024-03-21T17:14:03Z

We also want to validate amp_bf16 on CPU.

xuzhao9 · 2024-06-20T03:09:48Z

We are migrating to pt2 benchmark runner, so we do not plan to support bf16 in the legacy runner.

xuzhao9 mentioned this issue Mar 19, 2024

Readme Example is broken #2193

Closed

xuzhao9 closed this as completed Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validate the bf16 precisions across models #2203

Validate the bf16 precisions across models #2203

xuzhao9 commented Mar 19, 2024 •

edited

Loading

xuzhao9 commented Mar 21, 2024

Uh oh!

xuzhao9 commented Jun 20, 2024 •

edited

Loading

Uh oh!

Validate the bf16 precisions across models #2203

Validate the bf16 precisions across models #2203

Comments

xuzhao9 commented Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

xuzhao9 commented Mar 21, 2024

Uh oh!

xuzhao9 commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuzhao9 commented Mar 19, 2024 •

edited

Loading

xuzhao9 commented Jun 20, 2024 •

edited

Loading