Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VmModule.wrap_buffer is unreliable when sourced from compiler Output.map_memory() #17403

Open
stellaraccident opened this issue May 15, 2024 · 1 comment
Assignees

Comments

@stellaraccident
Copy link
Collaborator

Tracking bug to tie workarounds together for future work.

@stellaraccident stellaraccident self-assigned this May 15, 2024
stellaraccident added a commit to iree-org/iree-turbine that referenced this issue May 15, 2024
…evices. (#17)

This uses much of the plumbing of the custom ops workflow but is geared
for larger scale integrations. Whereas the custom op system is optimized
for things more "kernel" sized, focusing on specialization and JIT
compilation of variants, this workflow is geared towards integrating
entire programs (either from VMFB or JIT compiled for the device in use
on the fly) as a Torch callable.

Usage for bring-your-own-VMFB:

```
launch = Launchable.from_vm_module(lambda device: VmModule.mmap(device.vm_instance, "foo.vmfb"))
result = launch(tensor1, tensor2)
```

Usage for JIT compiling:

```
launch = Launchable.jit_compile(MLIR_ASM)
result = launch(tensor1, tensor2)
```

In the first case, it is the caller's responsibility to produce a VMFB
that is valid for the given device. In the JIT case, appropriate
compiler flags and targeting information will be set based on the type
of the device the input tensors are located on (i.e. if ROCM/CUDA, this
will also properly differentiate between heterogenous devices on the
system and compile a binary for each distinct target).

Limitations:

* The underlying mechanism currently uses the default stream for
synchronization. It is TBI to plumb through more explicit support.
* As a consequence of the above, we also are syncing the device after
launch.
* We are waiting for upstream PyTorch patches to land to get UUIDs from
torch devices. Without this, enumeration order has to match, which is
not guaranteed.

Includes workarounds for:

* iree-org/iree#17402
* iree-org/iree#17403

---------

Signed-off-by: Stella Laurenzo <stellaraccident@gmail.com>
@ScottTodd
Copy link
Member

Oh, this is similar to #17635. Could de-dup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants