-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] TGI doesn't start due to permission denied #861
Comments
Looks like tgi is trying to access more locations after enabled TDX. |
That is good workaround. TGI service downloaded all required files to start successfully |
At 1.0 release, we enabled securityContext by default (#258). This will run the pod with non-root user, and with root file system readonly. At the TGI pod start, it will download the required model to a emptyDir mounted volume /data. Easy fix would be disable the securityContext, but that need discussion whether that's a good way to go. Before an official fix, you can use the above workaround while enabling TDX/kata. |
Thanks a lot. Workaround works for me. |
@ksandowi Could you track down which exact part of the (I would assume it to be either |
Priority
P1-Stopper
OS type
Ubuntu
Hardware type
Xeon-other (Please let us know in description)
Installation method
Deploy method
Running nodes
Single Node
What's the version?
tag v1.0
Description
The issue affects XEON on both SPR and EMR.
After modification of chatqna.yaml to run all services in TD (protected by TDX), all services running successfully except TGI service which fails during model downloading. It worked fine in previous (v0.9 and v0.8) versions
Reproduce steps
On a platform with TDX enabled, modify ~/GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna.yaml, so it run all services in TD:
Raw log
The text was updated successfully, but these errors were encountered: