[BFCL] Fix Dataset and Possible Answer Issue #557

HuanzhiMao · 2024-07-26T02:22:39Z

This PR fixes #550, fixes #541, and all the issues pointed out by the comments below.
We want to thank @lucenzhong and @XuHwang for pointing these out.

Here's a breakdown of the changes:

simple: 7 entry affected
- Indices: 13, 14, 15, 16, 200, 285, 375
multiple function: 3 entry affected.
- Indices: 29, 33, 99
parallel function: 5 entry affected.
- Indices: 26, 71, 72, 73, 89
parallel multiple function: 6 entry affected.
- Indices: 4, 19, 80, 83, 132, 195
executable parallel function: 1 entry affected
- Indices: 11
javascript: 3 entry affected
- Indices: 18, 29, 35

This will affect the leaderboard score. We will update it soon, in a different PR.

XuHwang · 2024-08-01T07:00:59Z

Hi, thanks for your efforts in supporting such a wonderful benchmark.

Recently I found some possible errors in ground truth as follows:

id 278 in simple.json: features is not required and default value price is right, while the ground truth requires it
id 285 in simple.json: location requires the format In the format City, State, while the ground truth contains Chicago but does not contain Chicago, Illinois
id 375 in simple.json: maybe the ground truth of items should include [pumpkin, dozen eggs]

HuanzhiMao · 2024-08-01T07:02:53Z

Hi, thanks for your efforts in supporting such a wonderful benchmark.

Recently I found some possible errors in ground truth as follows:

id 278 in simple.json: features is not required and default value price is right, while the ground truth requires it

id 285 in simple.json: location requires the format In the format City, State, while the ground truth contains Chicago but does not contain Chicago, Illinois

id 375 in simple.json: maybe the ground truth of items should include [pumpkin, dozen eggs]

Thanks for pointing this out!

Regarding simple_278, the ground truth is correct.
The question asks for Find me the average price and ratings of piano from Yamaha. while the default value for features only contain 'price', but not 'rating', so the model output must explicitly set it.

"features": {
    "type": "array",
    "items": {"type": "string", "enum": ["price", "rating"]},
    "description": "The features to retrieve about the instrument. Default is 'price'",
}

XuHwang · 2024-08-01T08:17:51Z

Hi, thanks for your efforts in supporting such a wonderful benchmark.
Recently I found some possible errors in ground truth as follows:

id 278 in simple.json: features is not required and default value price is right, while the ground truth requires it

id 285 in simple.json: location requires the format In the format City, State, while the ground truth contains Chicago but does not contain Chicago, Illinois

id 375 in simple.json: maybe the ground truth of items should include [pumpkin, dozen eggs]

Thanks for pointing this out!

Regarding simple_278, the ground truth is correct. The question asks for Find me the average price and ratings of piano from Yamaha. while the default value for features only contain 'price', but not 'rating', so the model output must explicitly set it.
"features": {
    "type": "array",
    "items": {"type": "string", "enum": ["price", "rating"]},
    "description": "The features to retrieve about the instrument. Default is 'price'",
}

I see. I agree~
Thanks a lot!

XuHwang · 2024-08-02T01:23:12Z

Hey, I found some possible errors of ground truth about the lambda function in python.

In gorilla_openfunctions_v1_test_simple.json:

{"id": "simple_13", "ground_truth": {"calculate_area_under_curve": {"function": ["x^2", "x**2"], "interval": [[1.0, 3.0]], "method": ["", "trapezoidal"]}}}
{"id": "simple_14", "ground_truth": {"calculate_derivative": {"function": ["3x^2 + 2x - 1", "3*x**2+2*x-1"], "x_value": ["", 0.0]}}}
{"id": "simple_15", "ground_truth": {"integrate": {"function": ["x^3", "x**3"], "start_x": [-2], "end_x": [3], "method": ["simpson"]}}}
{"id": "simple_16", "ground_truth": {"calculus.derivative": {"function": ["2*x^2", "2x^2", "2**x^2"], "value": [1], "function_variable": ["x", ""]}}}

While in gorilla_openfunctions_v1_test_parallel_multiple_function.json:

{"id": "parallel_multiple_function_4", "ground_truth": {"integral": {"function": ["x^2", "lambda x : x**2"], "a": [1.0], "b": [5.0]}, "derivative": {"function": ["x^2", "lambda x : x**2"], "x": [3.0]}}}

I think maybe the lambda function should be included in the simple cases as well?

HuanzhiMao · 2024-08-02T04:29:38Z

Hey, I found some possible errors of ground truth about the lambda function in python.

In gorilla_openfunctions_v1_test_simple.json:

{"id": "simple_13", "ground_truth": {"calculate_area_under_curve": {"function": ["x^2", "x**2"], "interval": [[1.0, 3.0]], "method": ["", "trapezoidal"]}}}
{"id": "simple_14", "ground_truth": {"calculate_derivative": {"function": ["3x^2 + 2x - 1", "3*x**2+2*x-1"], "x_value": ["", 0.0]}}}
{"id": "simple_15", "ground_truth": {"integrate": {"function": ["x^3", "x**3"], "start_x": [-2], "end_x": [3], "method": ["simpson"]}}}
{"id": "simple_16", "ground_truth": {"calculus.derivative": {"function": ["2*x^2", "2x^2", "2**x^2"], "value": [1], "function_variable": ["x", ""]}}}

While in gorilla_openfunctions_v1_test_parallel_multiple_function.json:

{"id": "parallel_multiple_function_4", "ground_truth": {"integral": {"function": ["x^2", "lambda x : x**2"], "a": [1.0], "b": [5.0]}, "derivative": {"function": ["x^2", "lambda x : x**2"], "x": [3.0]}}}

I think maybe the lambda function should be included in the simple cases as well?

Fair point. Updated.

...ley-function-call-leaderboard/data/possible_answer/gorilla_openfunctions_v1_test_simple.json

CharlieJCJ

LGTM

(Dataset Fix & New Model) (#572) This PR updates the leaderboard to reflect the changes in score due to the following PR merge: - #557 - #568 and the addition of the following models: - #569 - #570 - #573

@lucenzhong

This PR fixes ShishirPatil#550, fixes ShishirPatil#541, and all the issues pointed out by the comments below. We want to thank @lucenzhong and @XuHwang for pointing these out. Here's a breakdown of the changes: - simple: 7 entry affected - Indices: `13, 14, 15, 16, 200, 285, 375` - multiple function: 3 entry affected. - Indices: `29, 33, 99` - parallel function: 5 entry affected. - Indices: `26, 71, 72, 73, 89` - parallel multiple function: 6 entry affected. - Indices: `4, 19, 80, 83, 132, 195` - executable parallel function: 1 entry affected - Indices: `11` - javascript: 3 entry affected - Indices: `18, 29, 35` This will affect the leaderboard score. We will update it soon, in a different PR.

HuanzhiMao added 7 commits July 25, 2024 19:19

fix parallel_function_89

e802ed4

fix simple_200 possible answer

f77093e

fix parallel_function_26 possible answer

546c652

fix executable_parallel_function_11 ground truth

dd59759

fix javascript_18 question

d1a2453

fix javascript_29 possible answer

d2c4d90

fix javascript_35 possible answer

1a1a9fd

HuanzhiMao force-pushed the dataset-fix branch from 8ffe08c to 1a1a9fd Compare July 26, 2024 21:40

HuanzhiMao mentioned this pull request Jul 26, 2024

[BFCL] Incorrect ground truth #541

Closed

Merge branch 'main' into dataset-fix

38d0dc7

HuanzhiMao marked this pull request as ready for review July 30, 2024 22:24

Merge branch 'main' into dataset-fix

113ea29

HuanzhiMao added 2 commits August 1, 2024 00:25

fix simple_285 possible answer

f7634e2

fix simple_375 possible answer

146f48a

fix simple_13, 14, 15, 16 possible answer

a1bad0b

Merge branch 'main' into dataset-fix

8beb086

CharlieJCJ requested changes Aug 4, 2024

View reviewed changes

...ley-function-call-leaderboard/data/possible_answer/gorilla_openfunctions_v1_test_simple.json Show resolved Hide resolved

HuanzhiMao added 5 commits August 4, 2024 17:13

fix ^ issue for all possible answer

290187a

fix lambda issue for parallel function possible answer

da00de8

fix lambda issue for multiple function possible answer

aa0c834

fix lambda issue for parallel multiple function possible answer

805be1f

fix lambda issue for parallel_multiple_19

8a9876f

CharlieJCJ approved these changes Aug 5, 2024

View reviewed changes

ShishirPatil approved these changes Aug 5, 2024

View reviewed changes

ShishirPatil merged commit 0a49cfc into ShishirPatil:main Aug 5, 2024

HuanzhiMao deleted the dataset-fix branch August 5, 2024 21:31

HuanzhiMao mentioned this pull request Aug 7, 2024

[BFCL] Leaderboard Update, in sync with #557, #568, #569, #570, and #573 (Dataset Fix & New Model) #572

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BFCL] Fix Dataset and Possible Answer Issue #557

[BFCL] Fix Dataset and Possible Answer Issue #557

HuanzhiMao commented Jul 26, 2024 •

edited

Loading

XuHwang commented Aug 1, 2024

HuanzhiMao commented Aug 1, 2024 •

edited

Loading

XuHwang commented Aug 1, 2024

XuHwang commented Aug 2, 2024

HuanzhiMao commented Aug 2, 2024

CharlieJCJ left a comment

[BFCL] Fix Dataset and Possible Answer Issue #557

[BFCL] Fix Dataset and Possible Answer Issue #557

Conversation

HuanzhiMao commented Jul 26, 2024 • edited Loading

XuHwang commented Aug 1, 2024

HuanzhiMao commented Aug 1, 2024 • edited Loading

XuHwang commented Aug 1, 2024

XuHwang commented Aug 2, 2024

HuanzhiMao commented Aug 2, 2024

CharlieJCJ left a comment

Choose a reason for hiding this comment

HuanzhiMao commented Jul 26, 2024 •

edited

Loading

HuanzhiMao commented Aug 1, 2024 •

edited

Loading