Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about training recipe #17

Open
King-king424 opened this issue Jun 18, 2024 · 3 comments
Open

question about training recipe #17

King-king424 opened this issue Jun 18, 2024 · 3 comments

Comments

@King-king424
Copy link

image Which configuration file can reproduce the 54.x effect of the paper?
@farewellthree
Copy link
Collaborator

*qa.yaml

@Backdrop9019
Copy link

I find it a bit odd to use different training datasets depending on the benchmark. For example, with Videochat2, all instruction datasets were trained together and then evaluated on various benchmarks (as far as I know). However, for ST-LLM, different instruction datasets are used for training based on the benchmark and then evaluated separately. Doesn’t this seem unfair? I’m curious about the rationale behind dividing the data this way.

@farewellthree
Copy link
Collaborator

Hello, thank you for pointing out the issue. The reason we did this is that using instruction data in the form of multiple-choice questions from datasets like K400, SSV2, and CLEVRER, although beneficial for MVBench, would severely impact the model's dialogue performance, leading to significant hallucinations. Our approach actually used less data to achieve better results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants