Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Generate REST API schema #18

Merged
merged 14 commits into from
Feb 6, 2024
236 changes: 236 additions & 0 deletions specification/protocol/generate_rest.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
openapi: 3.1.0
info:
title: Open Inference API for text generation
description: Open Inference API for text generation
version: 1.0.0
components:
schemas:
Details:
type: object
additionalProperties: {}
properties:
finish_reason:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish_reason should be an enum

type: string
logprobs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both finish_reason and logprobs should be required if details is requested.

$ref: '#/components/schemas/Logprobs'
GenerateErrorResponse:
type: object
required:
- error
properties:
error:
type: string
GenerateParameters:
type: object
additionalProperties: {}
properties:
temperature:
type: number
format: float
default: 1
minimum: 0
description: What sampling temperature to use, higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p:
type: number
format: float
maximum: 1
minimum: 0
description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
max_tokens:
type: integer
format: int32
default: 20
minimum: 1
description: The maximum number of tokens to generate in the completion.
stop:
type: array
items:
type: string
description: Sequences where the API will stop generating further tokens.
logprob:
type: boolean
Copy link
Member

@yuzisun yuzisun Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a description for this flag, also I think this should be the details flag as logprob is one of the fields on it.

GenerateRequest:
type: object
required:
- text_input
properties:
text_input:
type: string
parameters:
allOf:
- $ref: '#/components/schemas/GenerateParameters'
GenerateResponse:
type: object
required:
- text_output
- model_name
properties:
text_output:
type: string
model_name:
type: string
model_version:
type: string
details:
$ref: '#/components/schemas/Details'
GenerateStreamResponse:
type: object
required:
- text_output
- model_name
properties:
text_output:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is concatenated text output, we might still want to see the token generated for each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Nvidia implementation, each response in returning cumulative set of tokens.

1st json
{
text_output: "Here is"
}
.
.
.
..
subsequent json response
{
text_output: "Here is the output for the prompt"
}

Should we add additional property to display token generated in current response set?

type: string
model_name:
type: string
model_version:
type: string
details:
$ref: '#/components/schemas/StreamDetails'
Logprobs:
Copy link
Member

@yuzisun yuzisun Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest change the naming to Token as it is not just logprob field, see https://github.com/huggingface/text-generation-inference/blob/main/docs/openapi.json#L844.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a description for this

type: array
items:
$ref: '#/components/schemas/Token'
StreamDetails:
type: object
additionalProperties: {}
properties:
finish_reason:
type: string
token:
$ref: '#/components/schemas/Token'
Token:
type: object
required:
- id
- text
- logprob
- special
properties:
id:
type: integer
format: int32
minimum: 0
logprob:
type: number
format: float
special:
type: boolean
text:
type: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we have descriptions for these fields

paths:
/v2/models/${MODEL_NAME}/versions/${MODEL_VERSION}/generate:
post:
parameters:
- name: MODEL_NAME
required: true
in: path
schema:
type: string
- name: MODEL_VERSION
required: true
in: path
schema:
type: string
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateRequest'
responses:
'200':
yuzisun marked this conversation as resolved.
Show resolved Hide resolved
description: generated text
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateResponse'
'422':
description: Input validation error
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Input validation error
'424':
description: Generation Error
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Request failed during generation
'429':
description: Model is overloaded
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Model is overloaded
'500':
description: Incomplete generation
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Incomplete generation

/v2/models/${MODEL_NAME}/versions/${MODEL_VERSION}/generate_stream:
post:
parameters:
- name: MODEL_NAME
required: true
in: path
schema:
type: string
- name: MODEL_VERSION
required: true
in: path
schema:
type: string
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateRequest'
responses:
'200':
description: generated text stream
content:
text/event-stream:
schema:
$ref: '#/components/schemas/GenerateStreamResponse'
'422':
description: Input validation error
content:
text/event-stream:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Input validation error
'424':
description: Generation Error
content:
text/event-stream:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Request failed during generation
'429':
description: Model is overloaded
content:
text/event-stream:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Model is overloaded
'500':
description: Incomplete generation
content:
text/event-stream:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Incomplete generation
Loading