Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Batch inference using ez_deploy_config #19

Open
stikkireddy opened this issue Sep 5, 2024 · 2 comments
Open

[RFC] Batch inference using ez_deploy_config #19

stikkireddy opened this issue Sep 5, 2024 · 2 comments
Assignees
Labels

Comments

@stikkireddy
Copy link
Owner

stikkireddy commented Sep 5, 2024

  • the input should be a delta table with specific schema (for idempotency and recomputing inference if you change model, etc)
  • the output will be a column called "predictions" (user definable) and additional metadata on engine used
  • this should support incremental checkpointing of predictions to not cause large delta transactions to be lost due to issue with model or deployment
  • this should support local gpus

initially we can stick to:

  • python
  • focus on endpoints (the bottle neck will be IO as llm calls are going to be longer than any compute we will do where the code is being executed)
  • making the workload incremental and transactional

ideal interface:

def perform_batch(table, ez_deploy_config, batch_config) -> bool

The batch_config should let you control very few knobs, like (endpoint or local), (ray vs spark vs httpx) (parallelism), checkpointing strategy. For spark we can use streaming but partitions need to be tuned so that you can transact to delta table maybe every 1000 rows. GPUs MUST be saturated with requests (this may be easier to achieve with ray actors.

The output must always be a delta table.

If you need ray refer to this. This is very outdated but may be helpful: https://github.com/stikkireddy/llm-batch-inference/blob/main/01_batch_scoring_single_node.py

@stikkireddy
Copy link
Owner Author

using ray + gpu vms works phenomenally

@stikkireddy
Copy link
Owner Author

For batch using sglang it requires this: sgl-project/sglang#1127

@stikkireddy stikkireddy self-assigned this Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant