when to leverage COT for answering #5

whitesockcat · 2024-01-18T03:20:13Z

Thank you for the excellent work! After a thorough review of the paper, I have some inquiries:

It's clear that the COT command line is utilized for generating responses to numerical questions involving charts. I noticed that COT is applied within the MathQA dataset as mentioned in your paper. However, for other datasets, was COT employed consistently? How do you determine when to leverage COT for answering, and when to provide direct responses? Specifically, in the evaluation of the ChartQA dataset, was COT used?

I eagerly await your response.

FanqingM · 2024-01-18T04:08:18Z

The current evaluation of chartqa in the article does not use the instruction template of mathqa. We have tested that using the instruction template of mathqa to ask mathmatical questions in chartqa can continue to improve the accuracy compared with the accuracy in the current version of the paper (we will update the paper later).
When evaluating, you can refer to the evaluation code in the repo. I have provided five templates of instruction templates, all of which use the instruction templates in the code for evaluation, while chartqa is placed in the openqa template.

As for when to use the COT form to answer this question, we think it should depend on the user (consistent with think step-by-step when using gpt), such as What is the difference of the x1 and the x2? Of course the user can Use the normal QA template to ask questions, but this question is obviously a mathematical question, so users should be more inclined to use the mathematical template to ask questions,which can obtain more accurate results than normal QA (this conclusion is also verified in the paper)

whitesockcat · 2024-01-18T04:40:49Z

Thank you for your response.

Regarding the application of the instruction template, could you please clarify whether it was utilized across all questions within the ChartQA dataset to enhance accuracy, or was it specifically employed only for mathematical questions?

FanqingM · 2024-01-18T04:45:41Z

As I said in my previous answer, the accuracy of chartqa in the current version of the article is all measured using ordinary QA instructions, so the problem you mentioned does not exist;

Secondly, we later changed the mathematical problems in chartqa to mathematical templates (most of the problems in chartqa are element extraction and mathematical problems). This method can continue to improve the accuracy of chartqa based on the current version of the article.

We will update the article later and make the instructions.json for testing chartqa public, but for the current version of the article, this is not needed because the normal qa template is used for all chartqa

whitesockcat · 2024-01-19T09:25:33Z

Looking forward to your update!!!

whitesockcat · 2024-01-21T04:10:45Z

Could you please make the instructions.json file available at your earliest convenience?

FanqingM · 2024-01-23T15:22:19Z

For this version, we just use normal QA template, which is in accessory/single_turn_eval.py, you can refer the issue 6.
To generate with batches, you can use accessory/single_turn_eval_multitask.py and refer the issue #6

zhangliang-04 · 2024-02-18T05:18:23Z

Hi, i notice that in the updated version, the performance of MathQA is not changed but their is performance increase on ChartQA. It seems you just changed the evaluation process. Could you provide any details about the evaluation of ChartQA in the current version? Thanks a lot!

FanqingM · 2024-02-18T12:35:33Z

Hi, i notice that in the updated version, the performance of MathQA is not changed but their is performance increase on ChartQA. It seems you just changed the evaluation process. Could you provide any details about the evaluation of ChartQA in the current version? Thanks a lot!

We change the test instruction of ChartQA, use the instruction template for mathQA in the mathmatical problem in ChartQA, while we just use normal QA template for ChartQA earlier.

FanqingM closed this as completed Jan 18, 2024

FanqingM pinned this issue Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when to leverage COT for answering #5

when to leverage COT for answering #5

whitesockcat commented Jan 18, 2024 •

edited

Loading

FanqingM commented Jan 18, 2024 •

edited

Loading

whitesockcat commented Jan 18, 2024

FanqingM commented Jan 18, 2024

whitesockcat commented Jan 19, 2024

whitesockcat commented Jan 21, 2024

FanqingM commented Jan 23, 2024 •

edited

Loading

zhangliang-04 commented Feb 18, 2024

FanqingM commented Feb 18, 2024

when to leverage COT for answering #5

when to leverage COT for answering #5

Comments

whitesockcat commented Jan 18, 2024 • edited Loading

FanqingM commented Jan 18, 2024 • edited Loading

whitesockcat commented Jan 18, 2024

FanqingM commented Jan 18, 2024

whitesockcat commented Jan 19, 2024

whitesockcat commented Jan 21, 2024

FanqingM commented Jan 23, 2024 • edited Loading

zhangliang-04 commented Feb 18, 2024

FanqingM commented Feb 18, 2024

whitesockcat commented Jan 18, 2024 •

edited

Loading

FanqingM commented Jan 18, 2024 •

edited

Loading

FanqingM commented Jan 23, 2024 •

edited

Loading