-
Notifications
You must be signed in to change notification settings - Fork 251
AOTI/DSO model does not run in Linux #996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for testing out the repo @lhl! Looks like we're hitting a Can you check exporting/generating with the stories15M model to verify that the behavior itself is working? |
Looks like
( the 4090 has the full 24GB of VRAM so should have no problems fitting an 8B model. It just occurred to me that the issue might be because of Llama 3.1 - since compiled it might want 128K context - limiting the tokens to BTW speaking of
I have work/deadlines/travel so I won't be able to really followup further, I'm assuming anyone doing basic testing is probably going to run into similar issues, my config (clean mamba env on a 4090 seems as vanilla a setup as possible). |
I had the same C++ runner issue building runner for ET/PTE models in #985 |
🐛 Describe the bug
I am running an Arch Linux system with a 4090/3090 w/ and up-to-date CUDA 12.5 (
Build cuda_12.5.r12.5/compiler.34385749_0
)I have created a new mamba env for torchchat and run the install. Regular inferencing (eg with
generate
) works fine.I compile an AOTI model per the README:
When I try to run with the exported DSO model it gives an error:
I tried the C++ runner as well but it fails to build:
Versions
The text was updated successfully, but these errors were encountered: