-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Livepool llm rebased #210
Livepool llm rebased #210
Conversation
@@ -54,6 +54,9 @@ def load_pipeline(pipeline: str, model_id: str) -> any: | |||
from app.pipelines.segment_anything_2 import SegmentAnything2Pipeline | |||
|
|||
return SegmentAnything2Pipeline(model_id) | |||
case "llm-generate": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyriediculous can we rename this to chat-completion
? See discussion here.
|
||
logger = logging.getLogger(__name__) | ||
|
||
def get_max_memory(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response_model=LlmResponse, | ||
responses=RESPONSES, | ||
description="Generate text responses from input prompts using a large language model.", | ||
operation_id="genLlm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@victorges can you check the SDK parameters are as expected 🙏🏻.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Sent inline comment above
runner/requirements.txt
Outdated
@@ -16,4 +16,3 @@ scipy==1.13.0 | |||
numpy==1.26.4 | |||
av==12.1.0 | |||
sentencepiece== 0.2.0 | |||
protobuf==5.27.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this requirement removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error from moving the requirements to a new docker image build. Added back.
logger = logging.getLogger(__name__) | ||
|
||
RESPONSES = { | ||
status.HTTP_200_OK: {"model": LlmResponse}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the only thing missing is adding the response name override for the SDK like this
"x-speakeasy-name-override": "data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing only the API/SDK shape
response_model=LlmResponse, | ||
responses=RESPONSES, | ||
description="Generate text responses from input prompts using a large language model.", | ||
operation_id="genLlm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Sent inline comment above
@router.post( | ||
"/llm-generate", | ||
response_model=LlmResponse, | ||
responses=RESPONSES, | ||
description="Generate text responses from input prompts using a large language model.", | ||
operation_id="genLlm", | ||
summary="LLM Generate", | ||
tags=["generate"], | ||
openapi_extra={"x-speakeasy-name-override": "llm"}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove generate
from the endpoint as it gets redundant, since we are already calling all the AI APIs "generate" for now. My current preferred option would be:
@router.post( | |
"/llm-generate", | |
response_model=LlmResponse, | |
responses=RESPONSES, | |
description="Generate text responses from input prompts using a large language model.", | |
operation_id="genLlm", | |
summary="LLM Generate", | |
tags=["generate"], | |
openapi_extra={"x-speakeasy-name-override": "llm"}, | |
) | |
@router.post( | |
"/chat-completion", | |
response_model=LlmResponse, | |
responses=RESPONSES, | |
description="Generate text responses from input prompts using a language model.", | |
operation_id="genChatCompletion", | |
summary="Chat Completion", | |
tags=["generate"], | |
openapi_extra={"x-speakeasy-name-override": "chatCompletion"}, | |
) |
In the SDK, this would look like: client.generate.chat_completion(...)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update, @kyriediculous made a good point about not using "chat completion" to avoid confusion since we don't really implement OpenAIs interface. My current preferred options, highest to lowest, are now:
/chat
/llm
/text
With the corresponding operation_id/summary/speakeasy name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@router.post( | |
"/llm-generate", | |
response_model=LlmResponse, | |
responses=RESPONSES, | |
description="Generate text responses from input prompts using a large language model.", | |
operation_id="genLlm", | |
summary="LLM Generate", | |
tags=["generate"], | |
openapi_extra={"x-speakeasy-name-override": "llm"}, | |
) | |
@router.post( | |
"/chat", | |
response_model=LlmResponse, | |
responses=RESPONSES, | |
description="Generate text responses from input prompts using a language model.", | |
operation_id="genChat", | |
summary="Chat", | |
tags=["generate"], | |
openapi_extra={"x-speakeasy-name-override": "chat"}, | |
) |
and in the SDK it looks like client.generate.chat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestions. My preference is with Chat
as GenerateChat
does also make sense 👍.
Closing,LLM PR by Livepool is updated |
Rebase Livepool LLM PR showing updates to rebase to current ai-worker and couple fixes:
containerHosts
port to match to route for managed containers.check_torch_cuda.py
helper file to dev folder