[RFC] Batch inference using ez_deploy_config #19

stikkireddy · 2024-09-05T11:07:30Z

the input should be a delta table with specific schema (for idempotency and recomputing inference if you change model, etc)
the output will be a column called "predictions" (user definable) and additional metadata on engine used
this should support incremental checkpointing of predictions to not cause large delta transactions to be lost due to issue with model or deployment
this should support local gpus

initially we can stick to:

python
focus on endpoints (the bottle neck will be IO as llm calls are going to be longer than any compute we will do where the code is being executed)
making the workload incremental and transactional

ideal interface:

def perform_batch(table, ez_deploy_config, batch_config) -> bool

The batch_config should let you control very few knobs, like (endpoint or local), (ray vs spark vs httpx) (parallelism), checkpointing strategy. For spark we can use streaming but partitions need to be tuned so that you can transact to delta table maybe every 1000 rows. GPUs MUST be saturated with requests (this may be easier to achieve with ray actors.

The output must always be a delta table.

If you need ray refer to this. This is very outdated but may be helpful: https://github.com/stikkireddy/llm-batch-inference/blob/main/01_batch_scoring_single_node.py

The text was updated successfully, but these errors were encountered:

stikkireddy · 2024-09-08T20:43:01Z

using ray + gpu vms works phenomenally

ensure to enable prefix caching
[Frontend][Core] Move guided decoding params into sampling params vllm-project/vllm#8252 track this as its needed for most popular use cases
support for multiple multi-modal inputs
enable proper seq len and quantization for larger models

stikkireddy · 2024-09-09T11:04:45Z

For batch using sglang it requires this: sgl-project/sglang#1127

stikkireddy self-assigned this Sep 9, 2024

stikkireddy added the research label Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Batch inference using ez_deploy_config #19

[RFC] Batch inference using ez_deploy_config #19

stikkireddy commented Sep 5, 2024 •

edited

Loading

stikkireddy commented Sep 8, 2024

stikkireddy commented Sep 9, 2024

[RFC] Batch inference using ez_deploy_config #19

[RFC] Batch inference using ez_deploy_config #19

Comments

stikkireddy commented Sep 5, 2024 • edited Loading

stikkireddy commented Sep 8, 2024

stikkireddy commented Sep 9, 2024

stikkireddy commented Sep 5, 2024 •

edited

Loading