Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Fix Dataset and Possible Answer Issue #557

Merged
merged 18 commits into from
Aug 5, 2024

Conversation

HuanzhiMao
Copy link
Collaborator

@HuanzhiMao HuanzhiMao commented Jul 26, 2024

This PR fixes #550, fixes #541, and all the issues pointed out by the comments below.
We want to thank @lucenzhong and @XuHwang for pointing these out.

Here's a breakdown of the changes:

  • simple: 7 entry affected
    • Indices: 13, 14, 15, 16, 200, 285, 375
  • multiple function: 3 entry affected.
    • Indices: 29, 33, 99
  • parallel function: 5 entry affected.
    • Indices: 26, 71, 72, 73, 89
  • parallel multiple function: 6 entry affected.
    • Indices: 4, 19, 80, 83, 132, 195
  • executable parallel function: 1 entry affected
    • Indices: 11
  • javascript: 3 entry affected
    • Indices: 18, 29, 35

This will affect the leaderboard score. We will update it soon, in a different PR.

@HuanzhiMao HuanzhiMao marked this pull request as ready for review July 30, 2024 22:24
@XuHwang
Copy link
Contributor

XuHwang commented Aug 1, 2024

Hi, thanks for your efforts in supporting such a wonderful benchmark.

Recently I found some possible errors in ground truth as follows:

  1. id 278 in simple.json: features is not required and default value price is right, while the ground truth requires it
  2. id 285 in simple.json: location requires the format In the format City, State, while the ground truth contains Chicago but does not contain Chicago, Illinois
  3. id 375 in simple.json: maybe the ground truth of items should include [pumpkin, dozen eggs]

@HuanzhiMao
Copy link
Collaborator Author

HuanzhiMao commented Aug 1, 2024

Hi, thanks for your efforts in supporting such a wonderful benchmark.

Recently I found some possible errors in ground truth as follows:

  1. id 278 in simple.json: features is not required and default value price is right, while the ground truth requires it
  2. id 285 in simple.json: location requires the format In the format City, State, while the ground truth contains Chicago but does not contain Chicago, Illinois
  3. id 375 in simple.json: maybe the ground truth of items should include [pumpkin, dozen eggs]

Thanks for pointing this out!

Regarding simple_278, the ground truth is correct.
The question asks for Find me the average price and ratings of piano from Yamaha. while the default value for features only contain 'price', but not 'rating', so the model output must explicitly set it.

"features": {
    "type": "array",
    "items": {"type": "string", "enum": ["price", "rating"]},
    "description": "The features to retrieve about the instrument. Default is 'price'",
}

@XuHwang
Copy link
Contributor

XuHwang commented Aug 1, 2024

Hi, thanks for your efforts in supporting such a wonderful benchmark.
Recently I found some possible errors in ground truth as follows:

  1. id 278 in simple.json: features is not required and default value price is right, while the ground truth requires it
  2. id 285 in simple.json: location requires the format In the format City, State, while the ground truth contains Chicago but does not contain Chicago, Illinois
  3. id 375 in simple.json: maybe the ground truth of items should include [pumpkin, dozen eggs]

Thanks for pointing this out!

Regarding simple_278, the ground truth is correct. The question asks for Find me the average price and ratings of piano from Yamaha. while the default value for features only contain 'price', but not 'rating', so the model output must explicitly set it.

"features": {
    "type": "array",
    "items": {"type": "string", "enum": ["price", "rating"]},
    "description": "The features to retrieve about the instrument. Default is 'price'",
}

I see. I agree~
Thanks a lot!

@XuHwang
Copy link
Contributor

XuHwang commented Aug 2, 2024

Hey, I found some possible errors of ground truth about the lambda function in python.

In gorilla_openfunctions_v1_test_simple.json:

{"id": "simple_13", "ground_truth": {"calculate_area_under_curve": {"function": ["x^2", "x**2"], "interval": [[1.0, 3.0]], "method": ["", "trapezoidal"]}}}
{"id": "simple_14", "ground_truth": {"calculate_derivative": {"function": ["3x^2 + 2x - 1", "3*x**2+2*x-1"], "x_value": ["", 0.0]}}}
{"id": "simple_15", "ground_truth": {"integrate": {"function": ["x^3", "x**3"], "start_x": [-2], "end_x": [3], "method": ["simpson"]}}}
{"id": "simple_16", "ground_truth": {"calculus.derivative": {"function": ["2*x^2", "2x^2", "2**x^2"], "value": [1], "function_variable": ["x", ""]}}}

While in gorilla_openfunctions_v1_test_parallel_multiple_function.json:

{"id": "parallel_multiple_function_4", "ground_truth": {"integral": {"function": ["x^2", "lambda x : x**2"], "a": [1.0], "b": [5.0]}, "derivative": {"function": ["x^2", "lambda x : x**2"], "x": [3.0]}}}

I think maybe the lambda function should be included in the simple cases as well?

@HuanzhiMao
Copy link
Collaborator Author

Hey, I found some possible errors of ground truth about the lambda function in python.

In gorilla_openfunctions_v1_test_simple.json:

{"id": "simple_13", "ground_truth": {"calculate_area_under_curve": {"function": ["x^2", "x**2"], "interval": [[1.0, 3.0]], "method": ["", "trapezoidal"]}}}
{"id": "simple_14", "ground_truth": {"calculate_derivative": {"function": ["3x^2 + 2x - 1", "3*x**2+2*x-1"], "x_value": ["", 0.0]}}}
{"id": "simple_15", "ground_truth": {"integrate": {"function": ["x^3", "x**3"], "start_x": [-2], "end_x": [3], "method": ["simpson"]}}}
{"id": "simple_16", "ground_truth": {"calculus.derivative": {"function": ["2*x^2", "2x^2", "2**x^2"], "value": [1], "function_variable": ["x", ""]}}}

While in gorilla_openfunctions_v1_test_parallel_multiple_function.json:

{"id": "parallel_multiple_function_4", "ground_truth": {"integral": {"function": ["x^2", "lambda x : x**2"], "a": [1.0], "b": [5.0]}, "derivative": {"function": ["x^2", "lambda x : x**2"], "x": [3.0]}}}

I think maybe the lambda function should be included in the simple cases as well?

Fair point. Updated.

Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ShishirPatil ShishirPatil merged commit 0a49cfc into ShishirPatil:main Aug 5, 2024
@HuanzhiMao HuanzhiMao deleted the dataset-fix branch August 5, 2024 21:31
ShishirPatil pushed a commit that referenced this pull request Aug 15, 2024
 (Dataset Fix & New Model) (#572)

This PR updates the leaderboard to reflect the changes in score due to
the following PR merge:

- #557 
- #568 

and the addition of the following models:

- #569 
- #570 
- #573
aw632 pushed a commit to vinaybagade/gorilla that referenced this pull request Aug 22, 2024
This PR fixes ShishirPatil#550, fixes ShishirPatil#541, and all the issues pointed out by the
comments below.
We want to thank @lucenzhong and @XuHwang for pointing these out. 

Here's a breakdown of the changes:
- simple: 7 entry affected
  - Indices: `13, 14, 15, 16, 200, 285, 375`
- multiple function: 3 entry affected.
  - Indices: `29, 33, 99`
- parallel function: 5 entry affected.
  - Indices: `26, 71, 72, 73, 89`
- parallel multiple function: 6 entry affected.
  - Indices: `4, 19, 80, 83, 132, 195`
- executable parallel function: 1 entry affected
  - Indices: `11`
- javascript: 3 entry affected
  - Indices: `18, 29, 35`

This will affect the leaderboard score. We will update it soon, in a
different PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible answer for "parallel_function_29" [BFCL] Incorrect ground truth
4 participants