question about training recipe #17

King-king424 · 2024-06-18T11:37:35Z

Which configuration file can reproduce the 54.x effect of the paper?

farewellthree · 2024-06-23T08:24:33Z

*qa.yaml

Backdrop9019 · 2024-07-18T04:27:45Z

I find it a bit odd to use different training datasets depending on the benchmark. For example, with Videochat2, all instruction datasets were trained together and then evaluated on various benchmarks (as far as I know). However, for ST-LLM, different instruction datasets are used for training based on the benchmark and then evaluated separately. Doesn’t this seem unfair? I’m curious about the rationale behind dividing the data this way.

farewellthree · 2024-07-18T05:03:47Z

Hello, thank you for pointing out the issue. The reason we did this is that using instruction data in the form of multiple-choice questions from datasets like K400, SSV2, and CLEVRER, although beneficial for MVBench, would severely impact the model's dialogue performance, leading to significant hallucinations. Our approach actually used less data to achieve better results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about training recipe #17

question about training recipe #17

King-king424 commented Jun 18, 2024

farewellthree commented Jun 23, 2024

Backdrop9019 commented Jul 18, 2024

farewellthree commented Jul 18, 2024

question about training recipe #17

question about training recipe #17

Comments

King-king424 commented Jun 18, 2024

farewellthree commented Jun 23, 2024

Backdrop9019 commented Jul 18, 2024

farewellthree commented Jul 18, 2024