Skip to content

Describe nekko UX #9

@deitch

Description

@deitch

This ties into some of the other issues. We need to capture in one place what the CLI UX should look like, what commands and flags are needed.

Currently, it is a single command, roughly like this:

nekko.sh -r <runtime> -m <model> [-d <dataset>] [-i <image>] [-c <command>]

Which lets you pick:

  • what kind of runtime, which sets the default command and image
  • model location (OCI registry or HF)
  • dataset location (OCI registry or HF)
  • runtime image to use
  • command to execute

When you run it, it:

  1. Downloads the model and dataset, if not already cached
  2. Runs the runtime image with the command

This does not give you a lot of flexibility.

Proposed new setup

First, there need to be multiple commands. The following are recommended:

  • nekko run - equivalent of today: run a given model with a given image and command (default set by runtime, overridable) after ensuring it is present. But, this should be an inference run, i.e. minimal interactiveness
  • nekko pull - pull a model, dataset or runtime image; may need to be split into multiple subcommands, since it is not always clear which is a runtime image and which is model/dataset
  • nekko push - for future; push a model or dataset (and maybe runtime image)
  • nekko develop - similar to today, but with commands and setup to enable interactive running
  • nekko list - list downloaded models and datasets

Note the different experience between nekko develop and nekko run. With develop, people expect to get an interactive shell; with nekko run, people expect inference to run, either interactively like with an LLM, or one-shot and exit, like with a vision model and dataset.

We need different runtimes, for now, at least: onnx-eis, onnx-runtime, llama.cpp. There are two ways to do this: flag or subcommand.

  • Flag: nekko run -r onnx-eis vs nekko run -r llama.cpp
  • Subcommand: nekko llama.cpp run vs nekko onnx-eis run

There are pros and cons to both. The one advantage to a CLI flag, is that we might be able to determine the runtime dynamically, by looking at the model once downloaded. Then again, that may be a bit of a"magic". When you run HF CLI or libraries, does it automatically determine what the model type is and launch a runtime for it? Is that an expected behaviour?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions