Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Fix] Fix Error in Parallel Function Possible Answer #252

Merged
merged 5 commits into from
Mar 16, 2024

Conversation

Fanjia-Yan
Copy link
Collaborator

@Fanjia-Yan Fanjia-Yan commented Mar 11, 2024

As #235 pointed out, there seems to have some GT errors in the parallel_function answers, which only contain single function call. Here are the list of problematic indices: 14, 18, 19, 23, 87, 114, 139, 168. After validation, the generated multiple possible answers are combined into single.

In another word:
Error -> Correct
{"calculate_present_value": {"payment_per_year": [1000], "interest_rate": [0.05], "years": [10, 20, 30]}} -> {"calculate_present_value_1": {"payment_per_year": [1000], "interest_rate": [0.05], "years": [20]},"calculate_present_value_2": {"payment_per_year": [1000], "interest_rate": [0.05], "years": [30]}}

Stacking upon this PR, this addresses 2 issues in parallel_function and parallel_multiple_function test category:

  1. In possible answer, there exist empty function call, which error out the test case since we check for number of function call invoked in advance.
  2. Fix Error error in possible answer for parallel functions.

This PR fix the parallel function possible answer. A checker rerun is required before merging this PR.

Leaderboard PR should be merge at the same time.

@Fanjia-Yan Fanjia-Yan force-pushed the new_possible_answer branch from c2bc28c to c3fde17 Compare March 12, 2024 06:51
@Fanjia-Yan Fanjia-Yan marked this pull request as ready for review March 15, 2024 23:10
@ShishirPatil ShishirPatil merged commit cdcf183 into ShishirPatil:main Mar 16, 2024
ShishirPatil pushed a commit that referenced this pull request Mar 16, 2024
This PR removes any AST checker answers that contain the string literals
"True" and "False", thereby restricting the conditionals of the AST
checker to strictly boolean value.

Example:

`[true, "true"] -> [true]`

This PR will stack on top of #252

Will this change the results of leaderboard? YES

#253 will updated the
leaderboard with this fix.
devanshamin pushed a commit to devanshamin/gorilla that referenced this pull request Jul 9, 2024
…l#252)

Per ShishirPatil#235 , there seems to be some GT errors in the
parallel_function answers, which only contain single function call. Here
are the list of problematic indices: 14, 18, 19, 23, 87, 114, 139, 168.
After validation, the generated multiple possible answers are combined
into single.

In another word:
Error -> Correct
`{"calculate_present_value": {"payment_per_year": [1000],
"interest_rate": [0.05], "years": [10, 20, 30]}}` ->
`{"calculate_present_value_1": {"payment_per_year": [1000],
"interest_rate": [0.05], "years": [20]},"calculate_present_value_2":
{"payment_per_year": [1000], "interest_rate": [0.05], "years": [30]}}`


Stacking upon this PR, this addresses 2 issues in `parallel_function`
and `parallel_multiple_function` test category:
1. In possible answer, there exist empty function call, which error out
the test case since we check for number of function call invoked in
advance.
2. Fix Error error in possible answer for parallel functions.


~This PR fix the parallel function possible answer. A checker rerun is
required before merging this PR.~

Leaderboard PR should be merge at the same time.

---------

Co-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>
devanshamin pushed a commit to devanshamin/gorilla that referenced this pull request Jul 9, 2024
This PR removes any AST checker answers that contain the string literals
"True" and "False", thereby restricting the conditionals of the AST
checker to strictly boolean value.

Example:

`[true, "true"] -> [true]`

This PR will stack on top of ShishirPatil#252

Will this change the results of leaderboard? YES

ShishirPatil#253 will updated the
leaderboard with this fix.
aw632 pushed a commit to vinaybagade/gorilla that referenced this pull request Aug 22, 2024
…l#252)

Per ShishirPatil#235 , there seems to be some GT errors in the
parallel_function answers, which only contain single function call. Here
are the list of problematic indices: 14, 18, 19, 23, 87, 114, 139, 168.
After validation, the generated multiple possible answers are combined
into single.

In another word:
Error -> Correct
`{"calculate_present_value": {"payment_per_year": [1000],
"interest_rate": [0.05], "years": [10, 20, 30]}}` ->
`{"calculate_present_value_1": {"payment_per_year": [1000],
"interest_rate": [0.05], "years": [20]},"calculate_present_value_2":
{"payment_per_year": [1000], "interest_rate": [0.05], "years": [30]}}`


Stacking upon this PR, this addresses 2 issues in `parallel_function`
and `parallel_multiple_function` test category:
1. In possible answer, there exist empty function call, which error out
the test case since we check for number of function call invoked in
advance.
2. Fix Error error in possible answer for parallel functions.


~This PR fix the parallel function possible answer. A checker rerun is
required before merging this PR.~

Leaderboard PR should be merge at the same time.

---------

Co-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>
aw632 pushed a commit to vinaybagade/gorilla that referenced this pull request Aug 22, 2024
This PR removes any AST checker answers that contain the string literals
"True" and "False", thereby restricting the conditionals of the AST
checker to strictly boolean value.

Example:

`[true, "true"] -> [true]`

This PR will stack on top of ShishirPatil#252

Will this change the results of leaderboard? YES

ShishirPatil#253 will updated the
leaderboard with this fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants