- Add support for Device specification in `torch.compile` path, avoiding the need for `CUDA_VISIBLE_DEVICES` or similar - Add support for auto-detection of device based on model parameters