-
Notifications
You must be signed in to change notification settings - Fork 0
Description
This ties into some of the other issues. We need to capture in one place what the CLI UX should look like, what commands and flags are needed.
Currently, it is a single command, roughly like this:
nekko.sh -r <runtime> -m <model> [-d <dataset>] [-i <image>] [-c <command>]Which lets you pick:
- what kind of runtime, which sets the default command and image
- model location (OCI registry or HF)
- dataset location (OCI registry or HF)
- runtime image to use
- command to execute
When you run it, it:
- Downloads the model and dataset, if not already cached
- Runs the runtime image with the command
This does not give you a lot of flexibility.
Proposed new setup
First, there need to be multiple commands. The following are recommended:
nekko run- equivalent of today: run a given model with a given image and command (default set by runtime, overridable) after ensuring it is present. But, this should be an inference run, i.e. minimal interactivenessnekko pull- pull a model, dataset or runtime image; may need to be split into multiple subcommands, since it is not always clear which is a runtime image and which is model/datasetnekko push- for future; push a model or dataset (and maybe runtime image)nekko develop- similar to today, but with commands and setup to enable interactive runningnekko list- list downloaded models and datasets
Note the different experience between nekko develop and nekko run. With develop, people expect to get an interactive shell; with nekko run, people expect inference to run, either interactively like with an LLM, or one-shot and exit, like with a vision model and dataset.
We need different runtimes, for now, at least: onnx-eis, onnx-runtime, llama.cpp. There are two ways to do this: flag or subcommand.
- Flag:
nekko run -r onnx-eisvsnekko run -r llama.cpp - Subcommand:
nekko llama.cpp runvsnekko onnx-eis run
There are pros and cons to both. The one advantage to a CLI flag, is that we might be able to determine the runtime dynamically, by looking at the model once downloaded. Then again, that may be a bit of a"magic". When you run HF CLI or libraries, does it automatically determine what the model type is and launch a runtime for it? Is that an expected behaviour?