Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add a basic bfcl command-line interface (ShishirPatil#621)
add a simple cli wrapping openfunctions_evaluation.py (`bfcl run`) and eval_runner.py (`bfcl evaluate`). ``` ➜ bfcl Usage: bfcl [OPTIONS] COMMAND [ARGS]... ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ --install-completion Install completion for the current shell. │ │ --show-completion Show completion for the current shell, to copy it or customize the installation. │ │ --help -h Show this message and exit. │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭─ Commands ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ models List available models. │ │ test-categories List available test categories. │ │ run Run one or more models on a test-category (same as openfunctions_evaluation). │ │ results List the results available for evaluation. │ │ evaluate Evaluate results from run of one or more models on a test-category (same as eval_runner). │ │ scores Display the leaderboard. │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ➜ bfcl run -h Usage: bfcl run [OPTIONS] Run one or more models on a test-category (same as openfunctions_evaluation). ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ --model TEXT A list of model names to evaluate. │ │ [default: gorilla-openfunctions-v2] │ │ --test-category TEXT A list of test categories to run the evaluation on. [default: all] │ │ --api-sanity-check -c Perform the REST API status sanity check before running the │ │ evaluation. │ │ --temperature FLOAT The temperature parameter for the model. [default: 0.001] │ │ --top-p FLOAT The top-p parameter for the model. [default: 1.0] │ │ --max-tokens INTEGER The maximum number of tokens for the model. [default: 1200] │ │ --num-gpus INTEGER The number of GPUs to use. [default: 1] │ │ --timeout INTEGER The timeout for the model in seconds. [default: 60] │ │ --num-threads INTEGER The number of threads to use. [default: 1] │ │ --gpu-memory-utilization FLOAT The GPU memory utilization. [default: 0.9] │ │ --help -h Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ➜ bfcl evaluate -h Usage: bfcl evaluate [OPTIONS] Evaluate results from run of one or more models on a test-category (same as eval_runner). ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ * --model TEXT A list of model names to evaluate. [default: None] [required] │ │ * --test-category TEXT A list of test categories to run the evaluation on. [default: None] │ │ [required] │ │ --api-sanity-check -c Perform the REST API status sanity check before running the evaluation. │ │ --help -h Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` --------- Co-authored-by: Huanzhi (Hans) Mao <huanzhimao@gmail.com>
- Loading branch information