-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Entrypoints and Shared State implementation; Request for comments/feedback. #8030
Comments
@JacobSzwejbka @kimishpatel Thoughts? |
Ah yeah this is the recommended way to do this for now. I've brought it up to export before cc @angelayi because it has an obvious failing of what happens if one of the functions you are monkey patching internally calls forward. |
Yeah, I was thinking about this from your comment earlier. For at least the use case trying to export hugging face models, there needs to be a wrapper anyway, in order to have the static-kv cache implementation be a part of the export model state. If you go with the assumption that we will be writing a 'lightweight wrapper' for the model class, we just make sure that that wrapper does not have a forward method, then I am pretty sure it should be safe. Alternatively, one could auto-generate a wrapper class, placing whatever model you want to self.model, and then have methods call into it. |
My initial impression is that the individual prs all seem good. The cpp pattern looks good (SharedMemoryManager). I'll take a deeper look at the python exporter tomorrow. I'm a little hesitant to introduce a new wrapper on top of things in python. I'd like to see if theres a way we can take the solutions in your exporter and just integrate them into EdgeProgramManager. |
This is very cool, and thank you so much for writing up your process and the hurdles you had to get through export -- it was a great read!
I agree that export does not have a good model for multiple entrypoints, but for now, having a wrapper class generally works for exporting methods. |
🚀 The feature, motivation and pitch
Sorry this is kind-of long...
Overview
Hi, I am not really sure what the process for this is, but I made some changes/a wrapping to executorch to be able to have a shared state between multiple exported methods of a torch.module.
For now I implemented this in my own repo: cptspacemanspiff/execu-tools)
My initial motivation for this is that when exporting Encoder/Decoder models one would really like to use a precalculated cross attention cache, but this is hard to do without conditional branching (via torch.cond)/rewriting or extraneous copies/calculations.
My high-level thought process is that, for the most part, for a given model class (sequence to sequence, decoder only, etc), the location of forced graph breaks from the torch export process (due to either data-dependent branching or what not) are mostly in the same place. If you export the individual parts in python then glue these together with minimal code on the C++ side, The C++ runtime implementation should be able to generalize to other models of the same class. From what I can tell, this is already the case for decoder only models, but for other architectures that still have state this is less well defined.
Currently, I have 2 examples exported / working:
greedy
search, and there were changes to the models attention to support non-fixed batch sizes and encoder sequence lengths that are still not merged into transformers.What I am asking for:
I have been working on this on my own, but it seems like it is something that would be more generally useful. I just started really digging into the torch.export process/ executorch in the past month so may have some fundamental misunderstandings.
Aside from the changes required from executorch (of which there are surprisingly few), Most of my changes were implemented in my own repository, mainly because it was easier to setup, and I was able to avoid futzing around with the executorch build system. That being said, if this is a functionality that aligns with executorch's goals I would like to get it merged in. At least the exporter class that manages everything, alongside the c++ memory manager that handles things on the runtime side, though I would need advice on that.
Also, some specific things I have done to get this working are somewhere on the scale between somewhat janky and very janky. I would appreciate feedback/thoughts on alternatives/advice.
This is also providing a pointer/context issue for the executorch changes/pull requests (see bottom).
I am making a few pull requests that are not fully fleshed out with regards to testing/linting/etc, but illustrate what changes I needed to make to the executorch repository.
Alternatives
I looked at using torch.cond to explicitly implement the branching logic required. The issue I ran across was that each branch must be fully functional, with the same graph signature for each submodule on each branch. This forces the overall method signature must be the same in all cases, which leads to extra copy backs even when data is not modified in a branch.
The other alternative is writing a lot of C++ code, which adds complexity and bugs when inevitably it does not exactly match the python implementation.
RFC (Optional)
More details are here:
README: https://github.com/[cptspacemanspiff/execu-tools](https://github.com/cptspacemanspiff/execu-tools)
And a more in-depth rational/build log here: https://github.com/cptspacemanspiff/execu-tools/blob/main/DesignReasoning.md
What this looks like:
For the python export:
C++
High level:
Currently most of the wrapping logic is sitting in my own custom repo here:
Somewhat based on comments from #7458. While there were some changes to executorch itself, it is mostly a wrapper which that has a custom memory pass to place the shared state buffers into a common memory address.
Specifically, the main steps in the process are:
Related issues/pull requests (I am still in the process of uploading these, will add issue numbers as I get them.):
#7515 -> #7810 -> Fix copy back buffers are not the same
Export of function with no user input fails: #8031
Allow for multi-method exports to use the ETRecord and the inspector-cli: #8336
Add ability to pass hierarchical allocator to runtime Module method: #8032
Allow to override default event tracing names: #8033
cc @JacobSzwejbka @angelayi
The text was updated successfully, but these errors were encountered: