Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Generate REST API schema #18

Merged
merged 14 commits into from
Feb 6, 2024
Merged

Text Generate REST API schema #18

merged 14 commits into from
Feb 6, 2024

Conversation

gavrissh
Copy link
Contributor

@gavrissh gavrissh commented Dec 5, 2023

Propose generate rest api endpoints

/v2/models/{model_name}/versions/${MODEL_VERSION}/generate
/v2/models/{model_name}/versions/${MODEL_VERSION}/generate_stream


Screenshot 2024-01-16 at 6 43 56 PM
Screenshot 2024-01-16 at 6 44 46 PM
Screenshot 2024-01-16 at 6 46 46 PM

Reference - https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/protocol/extension_generate.html#generate-extension

Propose generate rest api endpoints

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
@gavrissh gavrissh changed the title Propose generate rest api endpoints schema Text Generate REST API schema Dec 6, 2023
@gavrissh gavrissh marked this pull request as ready for review December 6, 2023 12:27
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
- model_version
- done
properties:
text_output:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is concatenated text output, we might still want to see the token generated for each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Nvidia implementation, each response in returning cumulative set of tokens.

1st json
{
text_output: "Here is"
}
.
.
.
..
subsequent json response
{
text_output: "Here is the output for the prompt"
}

Should we add additional property to display token generated in current response set?

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
@gavrissh
Copy link
Contributor Author

@yuzisun Wanted to follow up, if the current state of changes are alright?

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
@gavrissh
Copy link
Contributor Author

gavrissh commented Jan 3, 2024

I have updated with all the recent discussed changes

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
@cmaddalozzo
Copy link

cmaddalozzo commented Jan 11, 2024

We should probably add the option to return log probabilities in the result. This seems to be fairly common among other APIs. This would comprise a boolean logprobs parameter in the request and a corresponding logprobs property in the response containing an array of objects with keys token and logprob.

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
@gavrissh
Copy link
Contributor Author

We should probably add the option to return log probabilities in the result. This seems to be fairly common among other APIs. This would comprise a boolean logprobs parameter in the request and a corresponding logprobs property in the response containing an array of objects with keys token and logprob.

I have updated the PR to support the above items

type: string
logprobs:
$ref: '#/components/schemas/Logprobs'
Logprobs:
Copy link
Member

@yuzisun yuzisun Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest change the naming to Token as it is not just logprob field, see https://github.com/huggingface/text-generation-inference/blob/main/docs/openapi.json#L844.

type: string
model_version:
type: string
logprobs:
Copy link
Member

@yuzisun yuzisun Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest change the name here, in TGI it is called details which includes the tokens, not sure if we should follow the same.
https://github.com/huggingface/text-generation-inference/blob/main/docs/openapi.json#L645

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@gavrissh gavrissh Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in OpenAI logprobs is property under choices.
Even I was not sure here. It is up for any suggestions

Current
Output -> {
text_output,
model_name,
model_version,
logprobs -> List[Token]
}

Token is followed as per TGI - https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/generate_stream
Token -> {
id,
logprob,
special,
text
}

type: string
finish_reason:
type: string
logprobs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For streaming case it is a single token.

parameters:
allOf:
- $ref: '#/components/schemas/GenerateParameters'
logprob:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be part of the GenerateParameters ?

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
type: string
description: Sequences where the API will stop generating further tokens.
logprob:
type: boolean
Copy link
Member

@yuzisun yuzisun Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a description for this flag, also I think this should be the details flag as logprob is one of the fields on it.

type: string
details:
$ref: '#/components/schemas/StreamDetails'
Logprobs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a description for this

Comment on lines 110 to 120
id:
type: integer
format: int32
minimum: 0
logprob:
type: number
format: float
special:
type: boolean
text:
type: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we have descriptions for these fields

type: object
additionalProperties: {}
properties:
finish_reason:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish_reason should be an enum

properties:
finish_reason:
type: string
logprobs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both finish_reason and logprobs should be required if details is requested.

Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
@yuzisun
Copy link
Member

yuzisun commented Jan 25, 2024

Thanks @gavrishp !! Great job on getting this going with the initial version.

/lgtm
/approve

Copy link

oss-prow-bot bot commented Jan 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gavrishp, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yuzisun yuzisun merged commit 52528cf into kserve:main Feb 6, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants