-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] AssertionError: causal mask is only for self attention #13
Comments
The error message does not appear to be related to EE-LLM, but rather seems to be caused by the environment. My inference server can generate content normally without any errors. The startup log of my server is as follows:
The specific prompt request and the corresponding response log are as follows:
The error might be related to flash-attention or PyTorch, because the version of Pytorch you are using is quite high, whereas EE-LLM was developed on a relatively older version. I recommend trying out the docker image suggested in the README (nvcr.io/nvidia/pytorch:22.12-py3) to see if it resolves the issue. |
Marking as stale. No activity in 60 days. |
Describe the bug
I tried to run a translation task on the checkpoint (converted 7b model), but a bug occurred from time to time (so no always, for some prompts the server works well).
one such prompt:
args sent to the server:
error message in the server logs:
To Reproduce
Steps to reproduce the behavior. The easier it is to reproduce the faster it will get maintainer attention.
Expected behavior
A clear and concise description of what you expected to happen.
Stack trace/logs
If applicable, add the stack trace or logs from the time of the error.
Environment (please complete the following information):
Proposed fix
If you have a proposal for how to fix the issue state it here or link to a PR.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: