You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously we were passing compile_fx the entire default inductor config
with the patches applied to it.
compile_fx only needs to be passed the patches (reference:
https://github.com/pytorch/pytorch/blob/29317f8585ecb232412df3f39734490f0f6d8230/torch/_inductor/compile_fx.py#L1873-L1880)
This PR changes vLLM to only pass the patches. This makes debugging
things easier (I can stare at just the delta and see what vLLM changed).
Test Plan:
I ran the following command and verified that performance didn't change.
```
VLLM_USE_V1=1 python benchmark_latency.py --model meta-llama/Meta-Llama-3-8B --batch-size 1 -O '{"level": 3, "compile_sizes": {1, 2}}'
```
Signed-off-by: rzou <zou3519@gmail.com>
0 commit comments