Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve split Dataset into train / test / eval #606

Closed
dreadatour opened this issue Nov 17, 2024 · 2 comments · Fixed by #678
Closed

Improve split Dataset into train / test / eval #606

dreadatour opened this issue Nov 17, 2024 · 2 comments · Fixed by #678
Assignees
Labels
enhancement New feature or request

Comments

@dreadatour
Copy link
Contributor

See original issue: #603

Some improvements we may want to be implemented:

Seed. Seed is usually implemented by XOR-ing: sys.rand ^ seed if seed is provided or without any XOR if not provided.
if XOR is not implemented yet and it takes time to implement we can implement it without seed for now.

Shuffle. Optional, as using sys.rand makes it shuffled by default. If we need shuffle=False at some point we might consider using sorting and sys.id instead. But I'm not sure shuffle=False is very practical in real applications.

@dmpetrov
Copy link
Member

Shuffle.

It's not needed. See #603 (comment)

@0x2b3bfa0
Copy link
Member

Closed with #678

@0x2b3bfa0 0x2b3bfa0 self-assigned this Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants