-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama2 Chatbot on Mac #2618
Llama2 Chatbot on Mac #2618
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2618 +/- ##
=======================================
Coverage 71.34% 71.34%
=======================================
Files 85 85
Lines 3905 3905
Branches 58 58
=======================================
Hits 2786 2786
Misses 1115 1115
Partials 4 4 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
… into examples/llama2_app
… into examples/llama2_app
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! a bunch of minor feedback but otherwise this is looking good
|
||
|
||
def start_server(): | ||
os.system("torchserve --start --model-store model_store --ncs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for snappier starts you can disable the compression in archiver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm..I am not sure I follow. When archiver is just passing the path to the weights using the yaml file and not the actual weights, this wouldn't come into picture?
server_state_container = st.container() | ||
server_state_container.subheader("Server status:") | ||
|
||
if st.session_state.started: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the server fail after this point?
url = "http://localhost:8081/models/" + MODEL_NAME | ||
res = requests.get(url) | ||
if res.status_code != 200: | ||
model_state_container.error("Error getting model status", icon="🚫") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so hopefully actual error logs are still somewhere? Don't wanna swallow the actual error message for someone trying to debug this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the terminal where you start the server shows the actual logs. But let me add a comment in the readme
Description
This is an example showing how to deploy a llama2 chat app using TorchServe on your laptop!
We use streamlit to create the app
We are using llama-cpp-python in this example
This example doesn't include streaming response.
Though i was able to get the server to send the response, the client was receiving junk .
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Logs for Test A
Test B
Logs for Test B
Checklist: