Skip to content

API Wrapper #8

Closed
Closed
@mostafa

Description

@mostafa

Idea

The idea is to have an API wrapper that:

  1. Receives the query (and the model version?)
  2. Calls the tokenizer to tokenize the query
  3. Calls the serving API for inference/prediction
  4. Return the response of the serving API

Features

  • User input validation
  • Rate limiting
  • Authentication (API keys?)
  • Caching
  • Provisioning and deployment
  • O11y:
    • Metrics
    • Logging
    • Tracing?
  • Usage metering
  • Analytics
  • Extensibility to include other APIs, libraries (libinjection) and model versions
  • Model warm-up (to load models after deployment)

Workflow

sequenceDiagram
    participant Client
    participant API Wrapper
    participant Tokenizer API
    participant Serving API
    Client ->> API Wrapper: send query
    API Wrapper ->> API Wrapper: authenticate user
    API Wrapper ->> API Wrapper: validate user input
    API Wrapper ->> API Wrapper: check rate limit
    API Wrapper ->> Tokenizer API: tokenize the query
    Tokenizer API ->> API Wrapper: receive tokens
    API Wrapper ->> Serving API: predict whether tokens are injection or not
    Serving API ->> API Wrapper: receive score
    API Wrapper ->> API Wrapper: cache query and score
    API Wrapper ->> Client: receive score
Loading

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions