You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
…evices. (#17)
This uses much of the plumbing of the custom ops workflow but is geared
for larger scale integrations. Whereas the custom op system is optimized
for things more "kernel" sized, focusing on specialization and JIT
compilation of variants, this workflow is geared towards integrating
entire programs (either from VMFB or JIT compiled for the device in use
on the fly) as a Torch callable.
Usage for bring-your-own-VMFB:
```
launch = Launchable.from_vm_module(lambda device: VmModule.mmap(device.vm_instance, "foo.vmfb"))
result = launch(tensor1, tensor2)
```
Usage for JIT compiling:
```
launch = Launchable.jit_compile(MLIR_ASM)
result = launch(tensor1, tensor2)
```
In the first case, it is the caller's responsibility to produce a VMFB
that is valid for the given device. In the JIT case, appropriate
compiler flags and targeting information will be set based on the type
of the device the input tensors are located on (i.e. if ROCM/CUDA, this
will also properly differentiate between heterogenous devices on the
system and compile a binary for each distinct target).
Limitations:
* The underlying mechanism currently uses the default stream for
synchronization. It is TBI to plumb through more explicit support.
* As a consequence of the above, we also are syncing the device after
launch.
* We are waiting for upstream PyTorch patches to land to get UUIDs from
torch devices. Without this, enumeration order has to match, which is
not guaranteed.
Includes workarounds for:
* iree-org/iree#17402
* iree-org/iree#17403
---------
Signed-off-by: Stella Laurenzo <stellaraccident@gmail.com>
Tracking bug to tie workarounds together for future work.
The text was updated successfully, but these errors were encountered: