Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

share sft-dataset #1

Open
yyht opened this issue Jun 28, 2024 · 4 comments
Open

share sft-dataset #1

yyht opened this issue Jun 28, 2024 · 4 comments

Comments

@yyht
Copy link

yyht commented Jun 28, 2024

hello, nice work. could share the sft-dataset in hf?

@X-Lai
Copy link
Collaborator

X-Lai commented Jun 28, 2024

Sure, it will be released soon. Please stay tuned.

@yapdianang
Copy link

Hi authors, following up on this thread to stay updated when the SFT datasets are released. Thanks and nice work!

@kleinzcy
Copy link

Hi authors, it is a nice work to advance the off-policy method for enhancing reason ability of LLM. I am following up on this thread to stay updated when the SFT datasets are released. Thanks!

@yyht
Copy link
Author

yyht commented Aug 20, 2024

hello everyone, https://huggingface.co/datasets/yingyingzhang/metamath-qwen2-math .
I use qwen2-math-instruct and open-source-datasets such as metamath-qa and numina-cot to construct a high quality sft-dataset.
When finetuning on qwen2-general-base or qwen2-math-base, the sft model could achieve comparable results to qwen2-instruct-7b\72b and qwen2-math-7b-instruct.
The whole datasets contains metamath-qwen2-math and none-synthetic datasets from https://huggingface.co/datasets/AI-MO/NuminaMath-CoT.
Please enjoy it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants