Closed
Description
Idea
The idea is to have an API wrapper that:
- Receives the query (and the model version?)
- Calls the tokenizer to tokenize the query
- Calls the serving API for inference/prediction
- Return the response of the serving API
Features
- User input validation
- Rate limiting
- Authentication (API keys?)
- Caching
- Provisioning and deployment
- O11y:
- Metrics
- Logging
- Tracing?
- Usage metering
- Analytics
- Extensibility to include other APIs, libraries (libinjection) and model versions
- Model warm-up (to load models after deployment)
Workflow
sequenceDiagram
participant Client
participant API Wrapper
participant Tokenizer API
participant Serving API
Client ->> API Wrapper: send query
API Wrapper ->> API Wrapper: authenticate user
API Wrapper ->> API Wrapper: validate user input
API Wrapper ->> API Wrapper: check rate limit
API Wrapper ->> Tokenizer API: tokenize the query
Tokenizer API ->> API Wrapper: receive tokens
API Wrapper ->> Serving API: predict whether tokens are injection or not
Serving API ->> API Wrapper: receive score
API Wrapper ->> API Wrapper: cache query and score
API Wrapper ->> Client: receive score