-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure results/checkpoint + New features: (analysis tools, large dataset support, 3D viz and other more) #79
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…e") throughout codebase
Is this the same issue @calebweinreb ? I'm running your lab's 3D dataset, but didn't realize I'd need multiple GPUs :( |
You shouldn't need multiple GPUs. Just use "mixed_map_iters" as described in here https://keypoint-moseq.readthedocs.io/en/latest/FAQs.html#troubleshooting |
You're a wizard, thank you again! 😀 I'll give it a try
…On Sat, Feb 3, 2024, 14:12 Caleb Weinreb ***@***.***> wrote:
You shouldn't need multiple GPUs. Just use "mixed_map_iters" as described
in here
https://keypoint-moseq.readthedocs.io/en/latest/FAQs.html#troubleshooting
—
Reply to this email directly, view it on GitHub
<#79 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHTNARA3L7W35543IVJG773YR2Y3NAVCNFSM6AAAAAA3HV535SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGQ3DQNJUGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces the following changes/features, which are explained in more detail below.
New logic for syllable indexing
Until now, the "extract_results" of keypoint-MoSeq saved saved syllable sequences in their original indexing (as they were represented during modeling) along with a "reindexed" version in which syllables were re-labeled by frequency (so syllable "0" was the most frequent, and so on). But this approach had a fatal flaw: when a fitted model was applied to new data, the syllable frequencies could be different, which would lead to a slightly different re-labeling, so that e.g. syllable "0" would refer to one state in a subset of recordings and a different state in another subset.
To prevent this issue, we now reindex syllable directly inside the model object. That way, if the model is used later to generate syllable for new data, the resulting labels will always be consistent. See #72 for details. Concretely, this means that
New format for results and checkpoint files
This PR introduces a new format for the
results.h5
andcheckpoint.p
files saved during modeling. This is a breaking change, meaning that results/checkpoints generated with a previous version of the code will no longer work. Below we explain the changes and provide code for converting to the new format.How the formats have changed
From a user perspective, the main change is that the
results.h5
no longer contains separatesyllables
andsyllables_reindexed
For the
results.h5
files, we have removed some fields and renamed others. Previously the format wasNow the format is
The
checkpoint.p
files have changed more substantively. They are now saved as hdf5 files (rather than joblib) and their internal organization has changed.Converting to the new format
The following code converts results and checkpoint files to the new format. Given a project directory and model name, a new project directory is generated with the updated files. As part of the reformatting, syllables are reindexed inside the model (see previous section) and a list of the resulting syllable name-changes is printed.
Make sure you are using the most up-to-date version of keypoint_moseq before running.
New analysis tools
This PR introduces a new set of analysis widgets and a tutorial notebook (
analysis.ipynb
) for using them. These widgets ingest results in the updated format described above. So make sure to run the conversion code before applying the analysis pipeline to an existing project!Support for large datasets
Currently it is not possible to model large datasets on a GPU without incurring out-of-memory (OOM errors). To address this problem, we have created a framework for mixed serial/parallel computation and added multi-GPU support.
Partial serialization
By default, modeling is parallelized across the full dataset. Here we introduce a new option for mixed parallel/serial computation where the data is split into batches that are processed one at a time. To enable this option, run the following code before fitting the model (if you have already initiated model fitting the kernel must be restarted)
This will split the data into 4 batches, which should reduce the memory requirements about 4-fold but also result in a 4-fold slow-down. The number of batches can be adjusted as needed.
Multi-GPU support
To use multiple GOUs, run the following code before fitting the model (if you have already initiated model fitting the kernel must be restarted)
This will split the computation across two GPUs.
Additional info on implementation
Both of the above options (multi-GPU support and partial serialization) rely on a new utility called
mixed_map
that we added to the jax_moseq package. Below is a copy of its docstring:3D plotting tools
In addition to 2D projections of 3D keypoints,
plot_pcs
andgenerate_trajectory_plots
now produce interactive 3D visualizations. These are rendered in the notebook and can also be viewed offline in a browser using the saved .html files.It is now possible to generate grid movies for 3D keypoints, although they will only show 2D projections of the keypoints and not the underlying video. To generate grid movies from 3D data, include the flag
keypoints_only=True
and set the desired projection plane with theuse_dims
argument, e.g.