Microsoft Windows [Version 10.0.22631.3155] (c) Microsoft Corporation. All rights reserved. C:\Users\Training\Desktop\TestSDPA>py TestSDPA.py Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:08<00:00, 4.00s/it] PhiForCausalLM( (model): PhiModel( (embed_tokens): Embedding(51200, 2560) (embed_dropout): Dropout(p=0.0, inplace=False) (layers): ModuleList( (0-31): 32 x PhiDecoderLayer( (self_attn): PhiSdpaAttention( (q_proj): Linear(in_features=2560, out_features=2560, bias=True) (k_proj): Linear(in_features=2560, out_features=2560, bias=True) (v_proj): Linear(in_features=2560, out_features=2560, bias=True) (dense): Linear(in_features=2560, out_features=2560, bias=True) (rotary_emb): PhiRotaryEmbedding() ) (mlp): PhiMLP( (activation_fn): NewGELUActivation() (fc1): Linear(in_features=2560, out_features=10240, bias=True) (fc2): Linear(in_features=10240, out_features=2560, bias=True) ) (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True) (resid_dropout): Dropout(p=0.1, inplace=False) ) ) (final_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True) ) (lm_head): Linear(in_features=2560, out_features=51200, bias=True) ) C:\Users\Training\Desktop\TestSDPA>