Improve split Dataset into train / test / eval #606

dreadatour · 2024-11-17T02:49:03Z

See original issue: #603

Some improvements we may want to be implemented:

Seed. Seed is usually implemented by XOR-ing: sys.rand ^ seed if seed is provided or without any XOR if not provided.
if XOR is not implemented yet and it takes time to implement we can implement it without seed for now.

Shuffle. Optional, as using sys.rand makes it shuffled by default. If we need shuffle=False at some point we might consider using sorting and sys.id instead. But I'm not sure shuffle=False is very practical in real applications.

dmpetrov · 2024-11-18T16:38:15Z

Shuffle.

It's not needed. See #603 (comment)

0x2b3bfa0 · 2024-12-12T13:55:52Z

Closed with #678

dreadatour added the enhancement New feature or request label Nov 17, 2024

dreadatour mentioned this issue Nov 17, 2024

We need a function to split Dataset into train / test / eval #603

Closed

dreadatour self-assigned this Nov 26, 2024

dreadatour mentioned this issue Nov 28, 2024

Update base Func class and tests #641

Merged

dreadatour linked a pull request Dec 3, 2024 that will close this issue

Implement 'seed' for 'train_test_split' + simplify split logic #657

Closed

0x2b3bfa0 mentioned this issue Dec 12, 2024

Implement 'seed' for 'train_test_split' (take two) #678

Merged

0x2b3bfa0 closed this as completed Dec 12, 2024

0x2b3bfa0 self-assigned this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve split Dataset into train / test / eval #606

Improve split Dataset into train / test / eval #606

dreadatour commented Nov 17, 2024

dmpetrov commented Nov 18, 2024

0x2b3bfa0 commented Dec 12, 2024

Improve split Dataset into train / test / eval #606

Improve split Dataset into train / test / eval #606

Comments

dreadatour commented Nov 17, 2024

dmpetrov commented Nov 18, 2024

0x2b3bfa0 commented Dec 12, 2024