Models and Model Blocks to Test the Pytorch Fusion Frontends

Basic Usage

python [model file] [engine: --jit_script|--ltc|--aot_autograd]

If you include no options with the model file, only eager mode will be timed

Profiling Usage

There are two profiling scripts:

profile_all.sh : profiles everything that runs
profile_api_start.sh : Only profiles after the Cuda API is called to start profiling after warmup

./scripts/profile_api_start.sh python [model file] [engine: --jit_script|--ltc|--aot_autograd] [--profile_with_nvtx]

Example Output

$ python simple_model.py --jit_script
>>> Eager-Time(us): 411.493 JIT_Script-Time(us): 368.355 JIT_Script-Speedup: 1.12

Mixed Precision Usage

Defaults to FP32 model and input data.

AMP Usage

Model parameters remain in FP32 and input data is in FP16

python [model file] [engine: --jit_script|--ltc|--aot_autograd] --amp

Mixed Precision with Model in FP16 and GradScaler (Advanced Performace Usage)

Model parameters are in FP16 and input data is in FP16

python [model file] [engine: --jit_script|--ltc|--aot_autograd] --max_fp16_perf

or

python [model file] [engine: --jit_script|--ltc|--aot_autograd] --grad_scaler --input_dtype=torch.float16 --model_dtype=torch.float16

This set of options does not work with Optimizers that rely on GradScaler to do the unscaling (Native Pytorch Optimizers) as it asserts on FP16 weights. For test purposes, just don't use the --grad_scaler flag.

Options

Engines

You have your choice of 4 different front engines. The Eager Engine is always run as a comparison point for speedup. If you don't specify an engine, just the Eager Engine will be run. Besides the Eager Engine, the rest give opportunities for fusion to the NVFuser backend for GPUs.

Engines:

Eager: Default: no switch needed.
JIT Script: --jit_script
Lazy Tensor Core: --ltc
AOT Autograd: --aot_autograd

Single GPU Models

Simple

Simple linear layer + relu and SGD Optimizer: simple_model.py
Simple conv layer + bn + relu and SGD Optimizer: simple_conv_model.py

Transformer Model Components

Multihead Attention Block with no optimizer: xformer_multihead_attn.py
Feed Forward Block with no optimizer: xformer_feed_fwd.py
One Encoder Layer with no optimizer: xformer_1_layer.py

Bert Models

Fixed Batch Size Models

Full Bert Model (bert-large) with APEX Lamb Optimizer: bert_model.py
Full Bert Model (bert-large) with Native AdamW Optimizer: bert_model_adam_opt.py
Bert Model with 1 Layer (bert-large sized) with no optimizer: bert_model_1_layer_no_opt.py

Dynamic Batch Size Models (sequence length per batch varies)

Full Bert Model (bert-large) with APEX Lamb Optimizer: dynamic_bert_model.py
Full Bert Model (bert-large) with Native AdamW Optimizer: dynamic_bert_model_adam_opt.py
Bert Model with 1 Layer (bert-large sized) with no optimizer: dynamic_bert_model_1_layer_no_opt.py

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
components		components
execution		execution
scripts		scripts
README.md		README.md
bert_model.py		bert_model.py
bert_model_1_layer_no_opt.py		bert_model_1_layer_no_opt.py
bert_model_adam_opt.py		bert_model_adam_opt.py
dynamic_bert_model.py		dynamic_bert_model.py
dynamic_bert_model_1_layer_no_opt.py		dynamic_bert_model_1_layer_no_opt.py
dynamic_bert_model_adam_opt.py		dynamic_bert_model_adam_opt.py
dynamic_simple_model.py		dynamic_simple_model.py
dynamic_simple_model_view.py		dynamic_simple_model_view.py
gnn_heterogenous_conv.py		gnn_heterogenous_conv.py
gnn_heterogenous_conv_dynamic.py		gnn_heterogenous_conv_dynamic.py
gnn_heterogenous_conv_static.py		gnn_heterogenous_conv_static.py
gnn_homogenous_conv.py		gnn_homogenous_conv.py
gnn_homogenous_conv_dynamic.py		gnn_homogenous_conv_dynamic.py
gnn_homogenous_conv_static.py		gnn_homogenous_conv_static.py
huggingface_bert_phase2.py		huggingface_bert_phase2.py
huggingface_models.py		huggingface_models.py
huggingface_t5.py		huggingface_t5.py
layer_norm.py		layer_norm.py
python_group_norm.py		python_group_norm.py
rms_norm.py		rms_norm.py
simple_conv_model.py		simple_conv_model.py
simple_model.py		simple_model.py
xformer_1_layer.py		xformer_1_layer.py
xformer_feed_fwd.py		xformer_feed_fwd.py
xformer_multihead_attn.py		xformer_multihead_attn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Models and Model Blocks to Test the Pytorch Fusion Frontends

Basic Usage

Profiling Usage

Example Output

Mixed Precision Usage

AMP Usage

Mixed Precision with Model in FP16 and GradScaler (Advanced Performace Usage)

Options

Engines

Single GPU Models

Simple

Transformer Model Components

Bert Models

Fixed Batch Size Models

Dynamic Batch Size Models (sequence length per batch varies)

About

Releases

Packages

Contributors 3

Languages

kevinstephano/simple_dl_models

Folders and files

Latest commit

History

Repository files navigation

Models and Model Blocks to Test the Pytorch Fusion Frontends

Basic Usage

Profiling Usage

Example Output

Mixed Precision Usage

AMP Usage

Mixed Precision with Model in FP16 and GradScaler (Advanced Performace Usage)

Options

Engines

Single GPU Models

Simple

Transformer Model Components

Bert Models

Fixed Batch Size Models

Dynamic Batch Size Models (sequence length per batch varies)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages