Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] massive raw data → sft data parallel generation #1414

Open
1 of 2 tasks
zjrwtx opened this issue Jan 7, 2025 · 2 comments
Open
1 of 2 tasks

[Feature Request] massive raw data → sft data parallel generation #1414

zjrwtx opened this issue Jan 7, 2025 · 2 comments
Assignees
Labels
Data Related to camel data processing enhancement New feature or request P0 Task with high level priority
Milestone

Comments

@zjrwtx
Copy link
Collaborator

zjrwtx commented Jan 7, 2025

Required prerequisites

Motivation

for initial qa datagen more automaticly when user source pdf、weblink content etc are very long.we should make all datagen method input Context length to 1million tokens above

Solution

No response

Alternatives

No response

Additional context

No response

@zjrwtx zjrwtx added enhancement New feature or request Data Related to camel data processing call for contribution P0 Task with high level priority labels Jan 7, 2025
@Wendong-Fan Wendong-Fan added this to the Sprint 21 milestone Jan 9, 2025
@Wendong-Fan
Copy link
Member

Wendong-Fan commented Jan 9, 2025

lead: @zjrwtx @harryeqs ; support & review: @AveryYay @koch3092

@zjrwtx
Copy link
Collaborator Author

zjrwtx commented Jan 10, 2025

#1431

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Related to camel data processing enhancement New feature or request P0 Task with high level priority
Projects
Status: No status
Development

No branches or pull requests

5 participants