-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/add streaming #18
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes in this pull request involve modifications to the Changes
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (7)
service/chat.py (4)
92-92
: Use capitalized environment variable forSYSTEM_FINGERPRINT
.Environment variable names are conventionally capitalized. Consider modifying the variable name to enhance readability and maintain consistency.
Apply this diff:
-system_fingerprint = os.getenv('system_fingerprint') +system_fingerprint = os.getenv('SYSTEM_FINGERPRINT')🧰 Tools
🪛 Ruff (0.8.0)
92-92: Use capitalized environment variable
SYSTEM_FINGERPRINT
instead ofsystem_fingerprint
(SIM112)
100-101
: Remove unused variablesfast_gen
anddraft_model
.The variables
fast_gen
anddraft_model
are assigned but never used, which may lead to confusion. Removing unused variables helps keep the code clean and maintainable.Apply this diff:
- fast_gen = model_dict.use_fast_generation - draft_model = model_dict.draft_model_name🧰 Tools
🪛 Ruff (0.8.0)
100-100: Local variable
fast_gen
is assigned to but never usedRemove assignment to unused variable
fast_gen
(F841)
101-101: Local variable
draft_model
is assigned to but never usedRemove assignment to unused variable
draft_model
(F841)
89-150
: Consider adding error handling in the streaming loop.In the
completions_stream
function, adding error handling within the streaming loop can help manage potential exceptions that may occur during streaming, such as network errors or unexpected interruptions.Apply this diff to include error handling:
thread.start() - for response in stream: + try: + for response in stream: filter_results = ContentFilterResults( hate=FilterCategory(filtered=False, severity="safe"), self_harm=FilterCategory(filtered=False, severity="safe"), sexual=FilterCategory(filtered=False, severity="safe"), violence=FilterCategory(filtered=False, severity="safe") ) prompt_filter_results = PromptFilterResults( prompt_index=0, content_filter_results=filter_results ) choice = Choice( content_filter_results=filter_results, finish_reason=None, index=0, logprobs=None, delta=Message(role="assistant", content=response) ) chat_response = ChatResponse( choices=[choice], created=int(time.time()), id=await generate_cmpl_id(), model=model_dict.model_name, object="chat.completion.chunk", prompt_filter_results=[prompt_filter_results], system_fingerprint=system_fingerprint, usage=Usage(completion_tokens=1, prompt_tokens=1, total_tokens=1) ) chat_response_json = json.dumps(chat_response, default=lambda o: o.__dict__) yield f"data: {chat_response_json}\n\n" + except Exception as e: + # Handle exceptions appropriately + error_response = {"error": str(e)} + yield f"data: {json.dumps(error_response)}\n\n" + thread.join() + return🧰 Tools
🪛 Ruff (0.8.0)
92-92: Use capitalized environment variable
SYSTEM_FINGERPRINT
instead ofsystem_fingerprint
(SIM112)
100-100: Local variable
fast_gen
is assigned to but never usedRemove assignment to unused variable
fast_gen
(F841)
101-101: Local variable
draft_model
is assigned to but never usedRemove assignment to unused variable
draft_model
(F841)
89-150
: Ensure proper cleanup of the thread after streaming completes.In the
completions_stream
function, after the streaming is done, it's good practice to ensure that the started thread is properly terminated or joined to prevent any resource leaks.Apply this diff:
yield "data: [DONE]\n\n" + thread.join()
🧰 Tools
🪛 Ruff (0.8.0)
92-92: Use capitalized environment variable
SYSTEM_FINGERPRINT
instead ofsystem_fingerprint
(SIM112)
100-100: Local variable
fast_gen
is assigned to but never usedRemove assignment to unused variable
fast_gen
(F841)
101-101: Local variable
draft_model
is assigned to but never usedRemove assignment to unused variable
draft_model
(F841)
chat_class.py (2)
21-26
: Consider a more type-safe initialization approach.While the current implementation provides flexibility, it bypasses dataclass's built-in type checking. Consider using
dataclasses.replace()
or explicit field initialization to maintain type safety.Here's a safer alternative:
def __init__(self, **kwargs): valid_keys = {field.name for field in self.__dataclass_fields__.values()} filtered_kwargs = {k: v for k, v in kwargs.items() if k in valid_keys} - for key, value in filtered_kwargs.items(): - setattr(self, key, value) + # Initialize with defaults first + super().__init__() + # Then use replace to maintain type checking + if filtered_kwargs: + self = dataclasses.replace(self, **filtered_kwargs)
45-46
: LGTM! Well-structured for streaming support.The addition of
delta
and makingmessage
optional properly supports both streaming and non-streaming scenarios. This follows common patterns in streaming APIs where updates are delivered incrementally.Consider documenting that:
- In streaming mode,
delta
contains incremental updates- In non-streaming mode,
message
contains the complete response- Both fields should not be populated simultaneously
main.py (1)
71-72
: Document the streaming capability in OpenAPI specificationThe streaming functionality represents a significant API change that should be documented.
Add FastAPI documentation decorators:
@app.post( "/openai/deployments/{model}/chat/completions", dependencies=[Depends(get_api_key)], responses={ "description": "Successful response", "content": { "application/json": {"description": "Regular response"}, "text/event-stream": {"description": "Streaming response"} } } } )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
chat_class.py
(3 hunks)main.py
(2 hunks)service/chat.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
service/chat.py
82-82: Redefinition of unused generate_cmpl_id
from line 16
(F811)
92-92: Use capitalized environment variable SYSTEM_FINGERPRINT
instead of system_fingerprint
(SIM112)
100-100: Local variable fast_gen
is assigned to but never used
Remove assignment to unused variable fast_gen
(F841)
101-101: Local variable draft_model
is assigned to but never used
Remove assignment to unused variable draft_model
(F841)
🔇 Additional comments (4)
service/chat.py (1)
89-150
: Verify that the delta
attribute in Choice
is correctly used.
Ensure that the Choice
class supports the delta
attribute and that it aligns with the expected schema. This is important for maintaining consistency in the response format.
Run the following script to check if delta
is a valid attribute of Choice
:
✅ Verification successful
The delta
attribute in Choice
is correctly used
Based on the code inspection, the Choice
class in chat_class.py
explicitly defines delta
as an optional Message
attribute: delta: Optional[Message] = None
. This matches the usage in the review code where delta
is being set with a Message
instance. The implementation is consistent with the expected schema.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the `delta` attribute in the `Choice` class definition.
# Test: Search for the `Choice` class definition and check for the `delta` attribute.
ast-grep --pattern '
class Choice {
$$$
delta: $_
$$$
}' --python
Length of output: 265
Script:
#!/bin/bash
# Let's try to find the Choice class definition using ripgrep
rg "class Choice" -A 10
Length of output: 443
Script:
#!/bin/bash
# Let's also check for any imports or usage of Choice class
rg "Choice" -A 2
Length of output: 995
Script:
#!/bin/bash
# Let's try ast-grep without the --python flag
ast-grep --pattern 'class Choice {
$$$
}'
Length of output: 45
🧰 Tools
🪛 Ruff (0.8.0)
92-92: Use capitalized environment variable SYSTEM_FINGERPRINT
instead of system_fingerprint
(SIM112)
100-100: Local variable fast_gen
is assigned to but never used
Remove assignment to unused variable fast_gen
(F841)
101-101: Local variable draft_model
is assigned to but never used
Remove assignment to unused variable draft_model
(F841)
chat_class.py (1)
6-7
: LGTM! Optional fields align with streaming requirements.
The change to make role
and content
optional is appropriate for streaming support where messages might be constructed incrementally.
Let's verify that these optional fields are handled properly in the codebase:
✅ Verification successful
Optional Message fields are properly implemented and safely used
Based on the verification results:
- The Message class is correctly defined with Optional fields
- The class is used safely in service/chat.py with explicit values for both role and content
- The Message type is properly handled as optional in parent classes (e.g.,
delta: Optional[Message] = None
) - No direct access to role/content without initialization was found, reducing null-reference risks
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for proper null handling of Message fields
rg -A 3 "Message\("
rg "message\.(role|content)"
Length of output: 908
Script:
#!/bin/bash
# Let's check for None/null handling patterns around Message class
rg -B 2 -A 2 "None.*Message|Message.*None"
# Let's also check the Message class implementation and its usage patterns
ast-grep --pattern 'class Message'
Length of output: 294
Script:
#!/bin/bash
# Let's examine the full Message class implementation and its usage context
rg -B 5 -A 10 "class Message"
# Let's also check for any validation or error handling around Message fields
rg -B 2 -A 2 "if.*message\.(role|content)"
Length of output: 618
main.py (2)
9-9
: LGTM: Import statement is correctly placed
The StreamingResponse import is appropriately added and necessary for the new streaming functionality.
71-72
: Verify streaming implementation compatibility
Please ensure that the streaming implementation is compatible with OpenAI's streaming format for client compatibility.
✅ Verification successful
Let me gather more information about the streaming implementation.
Let me check the complete response format and structure.
Streaming implementation follows OpenAI's format
The implementation correctly follows OpenAI's streaming format with:
- Server-sent events (SSE) with
text/event-stream
media type - JSON responses prefixed with
data:
- Proper message structure with
delta
field containing role and content - Final
[DONE]
event - Correct object type
chat.completion.chunk
for streaming
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the streaming implementation follows OpenAI's format
rg -A 10 "completions_stream" service/chat.py
Length of output: 469
Script:
#!/bin/bash
# Search for streaming response format in the chat service
rg -A 20 "completions_stream" service/chat.py
# Look for yield statements in the streaming implementation
ast-grep --pattern 'yield $_'
# Check for any OpenAI compatibility related code
rg -i "openai|completion" service/chat.py
Length of output: 1747
Script:
#!/bin/bash
# Check the ChatResponse and related models
rg -A 10 "class ChatResponse" service/chat.py
# Look for the streaming response structure
rg -B 5 -A 10 "chat_response_json" service/chat.py
# Check for any delta implementation in the response
rg -B 5 -A 5 "delta" service/chat.py
Length of output: 1116
async def generate_cmpl_id(): | ||
prefix = 'cmpl-' | ||
id_length = 22 | ||
id_chars = string.ascii_letters + string.digits | ||
random_id = ''.join(random.choice(id_chars) for _ in range(id_length)) | ||
return prefix + random_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate definition of generate_cmpl_id
function.
The function generate_cmpl_id
is redefined at lines 82-87, which duplicates the earlier definition at line 16. This can cause confusion and potential errors. Consider removing the duplicate function or consolidating any differences into a single definition.
Apply this diff to remove the duplicate definition:
-async def generate_cmpl_id():
- prefix = 'cmpl-'
- id_length = 22
- id_chars = string.ascii_letters + string.digits
- random_id = ''.join(random.choice(id_chars) for _ in range(id_length))
- return prefix + random_id
🧰 Tools
🪛 Ruff (0.8.0)
82-82: Redefinition of unused generate_cmpl_id
from line 16
(F811)
main.py
Outdated
if body.get("stream"): | ||
return StreamingResponse(ChatService.completions_stream(body, predacons_models[model], api_version), media_type="text/event-stream") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding request body validation
The request body is passed directly to the service without validation. Consider using a Pydantic model to validate the request structure.
Example implementation:
from pydantic import BaseModel, Field
class ChatCompletionsRequest(BaseModel):
stream: bool = Field(default=False, description="Whether to stream the response")
# Add other required fields based on your API contract
async def chat_completions(
request: ChatCompletionsRequest,
model: str,
api_version: str = Query(default=None, alias="api-version")
):
# Your implementation here
Add validation and error handling for the stream parameter
The stream parameter is accessed directly without type checking or validation. This could lead to runtime errors if the parameter is present but not a boolean.
Consider adding validation like this:
- if body.get("stream"):
+ stream = body.get("stream", False)
+ if not isinstance(stream, bool):
+ raise HTTPException(
+ status_code=status.HTTP_400_BAD_REQUEST,
+ detail="The 'stream' parameter must be a boolean"
+ )
+ if stream:
Committable suggestion skipped: line range outside the PR's diff.
Add error handling for streaming response
The streaming response should be wrapped in a try-catch block to handle potential errors during streaming.
Consider adding error handling:
if body.get("stream"):
- return StreamingResponse(ChatService.completions_stream(body, predacons_models[model], api_version), media_type="text/event-stream")
+ try:
+ stream_generator = ChatService.completions_stream(body, predacons_models[model], api_version)
+ return StreamingResponse(
+ stream_generator,
+ media_type="text/event-stream",
+ headers={"Cache-Control": "no-cache", "Connection": "keep-alive"}
+ )
+ except Exception as e:
+ raise HTTPException(
+ status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+ detail=f"Streaming error: {str(e)}"
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if body.get("stream"): | |
return StreamingResponse(ChatService.completions_stream(body, predacons_models[model], api_version), media_type="text/event-stream") | |
if body.get("stream"): | |
try: | |
stream_generator = ChatService.completions_stream(body, predacons_models[model], api_version) | |
return StreamingResponse( | |
stream_generator, | |
media_type="text/event-stream", | |
headers={"Cache-Control": "no-cache", "Connection": "keep-alive"} | |
) | |
except Exception as e: | |
raise HTTPException( | |
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | |
detail=f"Streaming error: {str(e)}" | |
) |
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores
predacons
package in dependencies.