Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when to leverage COT for answering #5

Closed
whitesockcat opened this issue Jan 18, 2024 · 8 comments
Closed

when to leverage COT for answering #5

whitesockcat opened this issue Jan 18, 2024 · 8 comments

Comments

@whitesockcat
Copy link

whitesockcat commented Jan 18, 2024

Thank you for the excellent work! After a thorough review of the paper, I have some inquiries:

It's clear that the COT command line is utilized for generating responses to numerical questions involving charts. I noticed that COT is applied within the MathQA dataset as mentioned in your paper. However, for other datasets, was COT employed consistently? How do you determine when to leverage COT for answering, and when to provide direct responses? Specifically, in the evaluation of the ChartQA dataset, was COT used?

I eagerly await your response.

@FanqingM
Copy link
Collaborator

FanqingM commented Jan 18, 2024

The current evaluation of chartqa in the article does not use the instruction template of mathqa. We have tested that using the instruction template of mathqa to ask mathmatical questions in chartqa can continue to improve the accuracy compared with the accuracy in the current version of the paper (we will update the paper later).
When evaluating, you can refer to the evaluation code in the repo. I have provided five templates of instruction templates, all of which use the instruction templates in the code for evaluation, while chartqa is placed in the openqa template.

As for when to use the COT form to answer this question, we think it should depend on the user (consistent with think step-by-step when using gpt), such as What is the difference of the x1 and the x2? Of course the user can Use the normal QA template to ask questions, but this question is obviously a mathematical question, so users should be more inclined to use the mathematical template to ask questions,which can obtain more accurate results than normal QA (this conclusion is also verified in the paper)

@whitesockcat
Copy link
Author

Thank you for your response.

Regarding the application of the instruction template, could you please clarify whether it was utilized across all questions within the ChartQA dataset to enhance accuracy, or was it specifically employed only for mathematical questions?

@FanqingM
Copy link
Collaborator

As I said in my previous answer, the accuracy of chartqa in the current version of the article is all measured using ordinary QA instructions, so the problem you mentioned does not exist;

Secondly, we later changed the mathematical problems in chartqa to mathematical templates (most of the problems in chartqa are element extraction and mathematical problems). This method can continue to improve the accuracy of chartqa based on the current version of the article.

We will update the article later and make the instructions.json for testing chartqa public, but for the current version of the article, this is not needed because the normal qa template is used for all chartqa

@whitesockcat
Copy link
Author

Looking forward to your update!!!

@whitesockcat
Copy link
Author

Could you please make the instructions.json file available at your earliest convenience?

@FanqingM FanqingM pinned this issue Jan 23, 2024
@FanqingM
Copy link
Collaborator

FanqingM commented Jan 23, 2024

For this version, we just use normal QA template, which is in accessory/single_turn_eval.py, you can refer the issue 6.
To generate with batches, you can use accessory/single_turn_eval_multitask.py and refer the issue #6

@zhangliang-04
Copy link

Hi, i notice that in the updated version, the performance of MathQA is not changed but their is performance increase on ChartQA. It seems you just changed the evaluation process. Could you provide any details about the evaluation of ChartQA in the current version? Thanks a lot!

@FanqingM
Copy link
Collaborator

Hi, i notice that in the updated version, the performance of MathQA is not changed but their is performance increase on ChartQA. It seems you just changed the evaluation process. Could you provide any details about the evaluation of ChartQA in the current version? Thanks a lot!

We change the test instruction of ChartQA, use the instruction template for mathQA in the mathmatical problem in ChartQA, while we just use normal QA template for ChartQA earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants