Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a basic bfcl command-line interface #621

Merged
merged 22 commits into from
Oct 17, 2024
Merged

Conversation

mattf
Copy link
Contributor

@mattf mattf commented Sep 4, 2024

add a simple cli wrapping openfunctions_evaluation.py (bfcl run) and eval_runner.py (bfcl evaluate).

➜ bfcl
                                                                                                                             
 Usage: bfcl [OPTIONS] COMMAND [ARGS]...                                                                                     
                                                                                                                             
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion            Install completion for the current shell.                                                 │
│ --show-completion               Show completion for the current shell, to copy it or customize the installation.          │
│ --help                -h        Show this message and exit.                                                               │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ models            List available models.                                                                                  │
│ test-categories   List available test categories.                                                                         │
│ run               Run one or more models on a test-category (same as openfunctions_evaluation).                           │
│ results           List the results available for evaluation.                                                              │
│ evaluate          Evaluate results from run of one or more models on a test-category (same as eval_runner).               │
│ scores            Display the leaderboard.                                                                                │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

➜ bfcl run -h
                                                                                                                    
 Usage: bfcl run [OPTIONS]                                                                                          
                                                                                                                    
 Run one or more models on a test-category (same as openfunctions_evaluation).                                      
                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --model                           TEXT     A list of model names to evaluate.                                    │
│                                            [default: gorilla-openfunctions-v2]                                   │
│ --test-category                   TEXT     A list of test categories to run the evaluation on. [default: all]    │
│ --api-sanity-check        -c               Perform the REST API status sanity check before running the           │
│                                            evaluation.                                                           │
│ --temperature                     FLOAT    The temperature parameter for the model. [default: 0.001]             │
│ --top-p                           FLOAT    The top-p parameter for the model. [default: 1.0]                     │
│ --max-tokens                      INTEGER  The maximum number of tokens for the model. [default: 1200]           │
│ --num-gpus                        INTEGER  The number of GPUs to use. [default: 1]                               │
│ --timeout                         INTEGER  The timeout for the model in seconds. [default: 60]                   │
│ --num-threads                     INTEGER  The number of threads to use. [default: 1]                            │
│ --gpu-memory-utilization          FLOAT    The GPU memory utilization. [default: 0.9]                            │
│ --help                    -h               Show this message and exit.                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯


➜ bfcl evaluate -h
                                                                                                                    
 Usage: bfcl evaluate [OPTIONS]                                                                                     
                                                                                                                    
 Evaluate results from run of one or more models on a test-category (same as eval_runner).                          
                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --model                     TEXT  A list of model names to evaluate. [default: None] [required]               │
│ *  --test-category             TEXT  A list of test categories to run the evaluation on. [default: None]         │
│                                      [required]                                                                  │
│    --api-sanity-check  -c            Perform the REST API status sanity check before running the evaluation.     │
│    --help              -h            Show this message and exit.                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

@HuanzhiMao
Copy link
Collaborator

Hi @mattf,

Thank you so much for your PR and welcome! I really appreciate your contribution – this feature has been on our TODO list for a while, and it’s great to see it implemented.

I noticed a few TODOs left in the code. I'll take care of finishing those up and handle any merge conflicts. After that, we’ll be ready to move forward!

@mattf
Copy link
Contributor Author

mattf commented Sep 23, 2024

@HuanzhiMao i'm glad you like it. i've a few more commands i'll push up.

@HuanzhiMao
Copy link
Collaborator

@HuanzhiMao i'm glad you like it. i've a few more commands i'll push up.

Perfect.

@mattf
Copy link
Contributor Author

mattf commented Sep 23, 2024

my plan was to put a simple cli around the runner / evaluator / model definition code and then propose refactoring changes to make the cli simpler.

i've found the cli helpful for my runs, which means it's only had one user.

@HuanzhiMao
Copy link
Collaborator

I agree. CLI entries will be easier than cd into different directories and then run each script via python xxx.

HuanzhiMao added a commit that referenced this pull request Oct 9, 2024
…_credential_config.py (#675)

This PR addresses the issue of hard-coded relative file paths in BFCL,
which previously made it impossible to run the script from different
entry locations/directories. With this update, the script can now be
executed from any directory, unblocking #621.

Additionally, this PR automates the
`apply_function_credential_config.py` step, removing the need for users
to manually trigger the script to apply the credential files.


Part of the effort to merge #510.

---------

Co-authored-by: Devansh Amin <devanshamin97@gmail.com>
@HuanzhiMao HuanzhiMao added the BFCL-General General BFCL Issue label Oct 9, 2024
@HuanzhiMao
Copy link
Collaborator

@mattf, I have resolved all the TODOs in the code and polished it a bit. Anything else you would like to add before we merge this PR?

ps, I changed bfcl run to bfcl generate for a more intuitive name.

Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, suggested fixes in bfcl generate with consistent .env path. Functionalities works for my local testings.

The CLI looks super clean! Love it @mattf, thanks for the pr, and thanks @HuanzhiMao for code changes.

Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-tested on the same commands. LGTM

@ShishirPatil ShishirPatil merged commit 0a33e97 into ShishirPatil:main Oct 17, 2024
VishnuSuresh27 pushed a commit to VishnuSuresh27/gorilla that referenced this pull request Nov 11, 2024
…_credential_config.py (ShishirPatil#675)

This PR addresses the issue of hard-coded relative file paths in BFCL,
which previously made it impossible to run the script from different
entry locations/directories. With this update, the script can now be
executed from any directory, unblocking ShishirPatil#621.

Additionally, this PR automates the
`apply_function_credential_config.py` step, removing the need for users
to manually trigger the script to apply the credential files.


Part of the effort to merge ShishirPatil#510.

---------

Co-authored-by: Devansh Amin <devanshamin97@gmail.com>
VishnuSuresh27 pushed a commit to VishnuSuresh27/gorilla that referenced this pull request Nov 11, 2024
add a simple cli wrapping openfunctions_evaluation.py (`bfcl run`) and
eval_runner.py (`bfcl evaluate`).

```
➜ bfcl
                                                                                                                             
 Usage: bfcl [OPTIONS] COMMAND [ARGS]...                                                                                     
                                                                                                                             
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion            Install completion for the current shell.                                                 │
│ --show-completion               Show completion for the current shell, to copy it or customize the installation.          │
│ --help                -h        Show this message and exit.                                                               │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ models            List available models.                                                                                  │
│ test-categories   List available test categories.                                                                         │
│ run               Run one or more models on a test-category (same as openfunctions_evaluation).                           │
│ results           List the results available for evaluation.                                                              │
│ evaluate          Evaluate results from run of one or more models on a test-category (same as eval_runner).               │
│ scores            Display the leaderboard.                                                                                │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

➜ bfcl run -h
                                                                                                                    
 Usage: bfcl run [OPTIONS]                                                                                          
                                                                                                                    
 Run one or more models on a test-category (same as openfunctions_evaluation).                                      
                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --model                           TEXT     A list of model names to evaluate.                                    │
│                                            [default: gorilla-openfunctions-v2]                                   │
│ --test-category                   TEXT     A list of test categories to run the evaluation on. [default: all]    │
│ --api-sanity-check        -c               Perform the REST API status sanity check before running the           │
│                                            evaluation.                                                           │
│ --temperature                     FLOAT    The temperature parameter for the model. [default: 0.001]             │
│ --top-p                           FLOAT    The top-p parameter for the model. [default: 1.0]                     │
│ --max-tokens                      INTEGER  The maximum number of tokens for the model. [default: 1200]           │
│ --num-gpus                        INTEGER  The number of GPUs to use. [default: 1]                               │
│ --timeout                         INTEGER  The timeout for the model in seconds. [default: 60]                   │
│ --num-threads                     INTEGER  The number of threads to use. [default: 1]                            │
│ --gpu-memory-utilization          FLOAT    The GPU memory utilization. [default: 0.9]                            │
│ --help                    -h               Show this message and exit.                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯


➜ bfcl evaluate -h
                                                                                                                    
 Usage: bfcl evaluate [OPTIONS]                                                                                     
                                                                                                                    
 Evaluate results from run of one or more models on a test-category (same as eval_runner).                          
                                                                                                                    
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --model                     TEXT  A list of model names to evaluate. [default: None] [required]               │
│ *  --test-category             TEXT  A list of test categories to run the evaluation on. [default: None]         │
│                                      [required]                                                                  │
│    --api-sanity-check  -c            Perform the REST API status sanity check before running the evaluation.     │
│    --help              -h            Show this message and exit.                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```

---------

Co-authored-by: Huanzhi (Hans) Mao <huanzhimao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-General General BFCL Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants