Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement automatic train test splits #71

Open
aaprasad opened this issue Jul 24, 2024 · 1 comment · May be fixed by #81
Open

Implement automatic train test splits #71

aaprasad opened this issue Jul 24, 2024 · 1 comment · May be fixed by #81

Comments

@aaprasad
Copy link
Contributor

Right now we require users to specify the training, and validation videos. it would be nice to just have to specify a pool of videos and have dreem-train automatically divide up the the chunks into training and validation

@talmo
Copy link
Contributor

talmo commented Aug 13, 2024

Using sleap-io (docs):

import sleap_io as sio

# Load source labels.
labels = sio.load_file("labels.v001.slp")

# Make splits and export with embedded images.
labels.make_training_splits(n_train=0.8, n_val=0.1, n_test=0.1, save_dir="split1", seed=42)

# Splits will be saved as self-contained SLP package files with images and labels.
labels_train = sio.load_file("split1/train.pkg.slp")
labels_val = sio.load_file("split1/val.pkg.slp")
labels_test = sio.load_file("split1/test.pkg.slp")

Caveats:

  • This will automatically export it as a package (labeled frames will have embedded images), which we probably don't want to do here since it's a lot of image data to save out.
  • This has no logic for handling contiguous chunks (see also Support multi-video SLP files #70)

One implementation for a higher order data loader would be one that creates a set of sub-clips/segments that are contiguous (maybe with a tolerance for short gaps?).

Basically we want to loop over all labeled frames within Labels and find connected components of frames that are consecutive in time (optionally with a tolerance for gaps of few frames), belong to the same video, and have instances.

Then, the data loader could break up long clips into sub-samples, randomize across these, and natively handle both multi-video (#70), as well as train/val/test splitting.

@aaprasad aaprasad linked a pull request Aug 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants