-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to support input validation to match OpenAI behavior. #65
Changes to support input validation to match OpenAI behavior. #65
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, @jroesch! This looks good to me.
I've just finished re-organizing the compile-time information dump (e.g., no more hf config dependency under models/
) and maintain a minimum set of info that is needed for mlc_serve
and this engine config was my next step.
Do you mind if I take over? If that is okay, we can merge this quickly and iterate or combine it into my PR.
8e50529
to
994accc
Compare
83829ad
to
04f76fd
Compare
Okay this is now cleaned up on top of Sung's previous changes. |
@@ -102,7 +113,7 @@ def wait_for_request(self, timeout_seconds=None) -> bool: | |||
) | |||
|
|||
def has_pending_requests(self) -> bool: | |||
return self.queue or self.current_batch | |||
return self.queue or self.current_batch or self.cancelled_requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be why cancelation was taking a while, we wont' actually step forward to cancel requests if we don't submit new work to the queue or the current batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
@@ -192,14 +194,18 @@ def step(self) -> InferenceStepResult: | |||
def _is_ready_to_serve(self) -> bool: | |||
return self.worker_process is not None and self.worker_process.is_alive() | |||
|
|||
def _get_new_request_state(self, request: Request) -> RequestState: | |||
def _get_new_request_state(self, request: Request) -> Optional[RequestState]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value is never Optional
, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes this is a piece of bit rot from my previous attempt. Let me change that.
@@ -102,7 +113,7 @@ def wait_for_request(self, timeout_seconds=None) -> bool: | |||
) | |||
|
|||
def has_pending_requests(self) -> bool: | |||
return self.queue or self.current_batch | |||
return self.queue or self.current_batch or self.cancelled_requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
This PR exposes a token validation callback per-request enabling us to validate requests-post tokenization. We will mark RequestState's as invalid and then effectively cancel them when we go to process them in the worker. This will result in an error returned to the caller of the generation API.
We will use exception handling to trigger a validation error and return it to the user internally.
cc @masahi @sunggg