-
Notifications
You must be signed in to change notification settings - Fork 10
MoSeq2 Analysis Visualization Notebook Instructions
The "MoSeq2 Analysis Visualization Notebook" contains interactive tools to analyze behavior via MoSeq, such
- labeling syllables interactively
- computing syllable statistics
- visualizing how frequently syllables transition to one another
You must ALWAYS run the Load Progress
section before running interactive tools in the notebook.
If you installed MoSeq2 via Conda, please activate the MoSeq environment and start a jupyter notebook in your project folder. If you are using the Docker container, please make sure your MoSeq container is running and connected to your project folder. Make sure that the analysis notebook is copied into your project folder.
To run this notebook, you need the following files in your data directory:
-
progress.yaml
(theprogress.yaml
file that contains all the required MoSeq paths) -
model.p
(trained AR-HMM to compute statistics from) -
moseq2-index.yaml
(themoseq2-index.yaml
generated containing paths to extracted sessions that will be used to generate syllable crowd movies) -
config.yaml
(configuration file that contains configured parameters throughout the MoSeq pipeline) -
_pca/
(PCA-related data generated from the PCA section) -
aggregate_results/
(aggregated session data)
At this stage, the base directory should contain the necessary files above, as shown below:
. ** current working directory
└── <base_dir>/
├── progress.yaml
├── config.yaml
├── moseq2-index.yaml
├── model_session_path/
├ └── model.p
...
├── _pca/
└── aggregate_results/
Note: this notebook uses progress.yaml
to keep track of all the necessary paths. Please ensure you run the Load Progress
cell before running any analysis modules. If the PCA and modeling steps are done uing the Command Line Interface, set init = True
and overwrite = True
in progress_paths = restore_progress_vars(progress_file=progress_filepath, init = False, overwrite = False)
.
Get best model fit is used to determine whether the trained model has captured median syllable durations that match the principal components' changepoints. If there are more than one trained model in progress_paths['base_model_path']
, the feature returns the best model that matches the principal components' changepoints from a list of models.
The command supports comparison concerning two objectives: duration
and jsd
. duration
finds the model where the median syllable duration best matches that of the principal components' changepoints. jsd
finds the model where the distribution of syllable durations best match that of the principal components' changepoints.
If there are multiple models in the inputted folder, then the outputted figure will plot multiple dashed distribution curves representing distributions of unselected models and 2 solid distribution curves that show the "Best"/chosen model and the principal compoments' changepoint durations.
This section produces two dataframes: moseq_df
and mean_df
.
The two dataframes are used to generate behavioral summaries, which we call fingerprints, and are used generally for analysis.
moseq_df
is a vertically stacked dataframe of scalar values measured during the extraction step, aligned with the model_labels and timestamps.
The shape would be (sum_of_session_frames, 31). To view all the measured scalars, type print(moseq_df.columns)
.
This dataframe can be used to plot the scalar feature values for any session over time.
The cell output in the notebook shows a preview of the top 5 rows in the dataframe, with moseq_df.head()
.
Note: the rows in the labels
columns contain -5
for the first 3 frames of each session's recordings.
This is because we use the first 3 frames to initialize the AR-HMM, and thus cannot supply a syllable label to them. We generally remove these frames from the analysis.
This dataframe contains the following columns:
column name | description | unit |
---|---|---|
angle | the orientation of the mouse body | radians |
area_mm | the area of the mouse | mm^2 |
area_px | the area of the mouse | pixels |
centroid_x_mm | center of the mouse (x coordinate) | mm |
centroid_x_px | center of the mouse (x coordinate) | pixels |
centroid_y_mm | center of the mouse (y coordinate) | mm |
centroid_y_px | center of the mouse (y coordinate) | pixels |
height_ave_mm | average height across the entire visible mouse | mm |
length_mm | mouse length measured roughly across the spine | mm |
length_px | mouse length measured roughly across the spine | pixels |
velocity_2d_mm | mouse 2D velocity (x,y velocity) | mm/frame |
velocity_2d_px | mouse 2D velocity (x,y velocity) | pixels/frame |
velocity_3d_mm | mouse 3D velocity (x,y,z velocity) | mm/frame |
velocity_3d_px | mouse 3D velocity (x,y,z velocity) | pixels/frame |
velocity_theta | direction/angle of the velocity vector | radians |
width_mm | mouse width | mm |
width_px | mouse width | pixels |
dist_to_center_px | distance between mouse center and arena center | pixels |
group | the assigned experimental group | NA |
uuid | session uuid assigned during extraction | NA |
h5_path | extraction h5 file path | NA |
timestamps | frame timestamp | seconds |
frame index | index of the frame in the recording | NA |
SessionName | name of the session pulled from the metadata.json file | NA |
SubjectName | name of the subject/mouse pulled from the metadata.json file | NA |
StartTime | time of day the session recording started | NA |
labels (original) | original syllable labels used by the AR-HMM | NA |
labels (usage sort) | syllable label sorted by usage (low numbers = most used; high numbers = rarely used) | NA |
labels (frames sort) | syllable label sorted by frames (low numbers = most used; high numbers = rarely used) | NA |
onset | indicates the onset of a syllable (1/True = start of syllable) | NA |
syllable index | syllable index | NA |
Syllables are arbitrarily labeled 0-100 (assuming the max-states
parameter is set to 100).
During the training process, the AR-HMM settles upon a random subset of of these 100 labels to describe the data it sees.
After training and applying an AR-HMM model to mouse data, we generally re-label syllables so assign meaning to those 100 labels.
We apply two re-labeling schemes to the original syllable labels:
- Usage sort: we re-label syllables by how many times a mouse instantiates them, regardless of the syllable's duration.
For example, if we have the set of original labels applied to 20 frames:
[2, 2, 2, 2, 10, 10, 2, 2, 2, 5, 5, 5, 5, 5, 5, 10, 10, 2, 2, 2]
then the usage sort will say syllable 2 was instantiated 3 times, syllable 5 - 1 time, syllable 10 - 2 times. The new mapping would look like the following: 2 -> 0, 10 -> 1, 5 -> 2, and the new sequence would look like:[0, 0, 0, 0, 1, 1, 0, 0, 0, 2, 2, 2, 2, 2, 2, 1, 1, 0, 0, 0]
. - Frames sort: we re-label syllables by how many frames are assigned each label. Generally, the two sortings result in similar mappings.
If we use the same set of original labels as in 1., syllable 2 is assigned to 10 frames, syllable 5 - 6 frames, syllable 10 - 4 frames.
The new mapping: 2 -> 0, 5 -> 1, 10 -> 2. The new sequence:
[0, 0, 0, 0, 2, 2, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 0, 0, 0]
.
stats_df
is a dataframe of the average syllable/scalar values for all the features included in stats_df
grouped by the resorted syllable labels, model groups, and UUIDs.
This dataframe will be used to plot mean syllable statistics and perform hypothesis testing.
The cell output in the notebook shows a preview of the top 5 rows in the dataframe, with stats_df.head()
.
Notes: In the compute_behavioral_statistics
function, the count
parameter can be either set to 'usage'
or 'frames'
, which determines how syllables are re-labeled.
See above for details on about each sorting.
This dataframe contains the following columns:
column name | description | unit |
---|---|---|
angle | the orientation of the mouse body | radians |
area_mm | the area of the mouse | mm^2 |
area_px | the area of the mouse | pixels |
centroid_x_mm | center of the mouse (x coordinate) | mm |
centroid_x_px | center of the mouse (x coordinate) | pixels |
centroid_y_mm | center of the mouse (y coordinate) | mm |
centroid_y_px | center of the mouse (y coordinate) | pixels |
height_ave_mm | average height across the entire visible mouse | mm |
length_mm | mouse length measured roughly across the spine | mm |
length_px | mouse length measured roughly across the spine | pixels |
velocity_2d_mm | mouse 2D velocity (x,y velocity) | mm/frame |
velocity_2d_px | mouse 2D velocity (x,y velocity) | pixels/frame |
velocity_3d_mm | mouse 3D velocity (x,y,z velocity) | mm/frame |
velocity_3d_px | mouse 3D velocity (x,y,z velocity) | pixels/frame |
velocity_theta | direction/angle of the velocity vector | radians |
width_mm | mouse width | mm |
width_px | mouse width | pixels |
dist_to_center_px | distance between mouse center and arena center | pixels |
timestamps | averge frame timestamp in the syllable | seconds |
frame index | average index of the frames in the syllable | NA |
usage | the probability a syllable is used | NA |
duration | average syllable duration | seconds |
syllable key | sorting of the syllables (see above) | NA |
syllable | syllable label | NA |
Interactive Syllable Labelling is for assigning behavioral labels and short descriptions to syllables by observing the crowd movies and the Syllable Info table.
This widget will automatically generate crowd movies and store them in a folder called crowd_movies
in the model-specific subfolder, specified in model_session_path
in the progress.yaml
file.
Note that syllables are relabeled by usage from here on out.
A syll_info.yaml
will be generated in the model-specific subfolder to record the syllable names and short descriptions.
Use the contents of the syll_info.yaml
file or the crowd movie file names to find the mapping from the original syllable label to one relabeled by usage.
Note: each time new crowd movies are created, the syll_info.yaml
file gets re-written.
If you don't want this to happen, you must manually rename the syll_info.yaml
file before re-running this widget.
- In
v1.2.0
, min, max, standard deviation were added to thestats_df
dataframe (previouslymean_df
). If you have dataframe parquet files such assyll_df.parquet
ormoseq_scalar_dataframe.parquet
, you may seeKeyError: 'velocity_2d_mm_mean
when you run the Crowd Movie Generation and Interactive Syllable Labelling Tool. Please delete the parquet files and rerun the cell.
- Run the cell to launch the interactive Syllable Labelling Tool.
- Select a syllable from the
Syllable
dropdown menu to view the associated crowd movie and syllable info. - Use the
Playback Speed
slider to adjust the crowd movie playback speed to better observe the behavior associated with short/fast syllables. - Enter the syllable label in the
Syllable Name
field and desired description inShort Description
. - Click
Save Setting
to save the syllable label and description for later analysis. - Use
Next
andPrevious
to navigate between syllables and the syllable label and description will be automatically saved when using these buttons.
Interactive Syllable Statistics Graphing is for plotting different syllable statistics and their differences in the modeled groups. The dendrogram displayed below the statistics plot represents the hierarchically sorted pairwise distances between the given model's autoregressive matrices representing the syllables.
- Run the cell to launch the Interactive Syllable Statistics Tool.
- Select the parameters from the dropdown menus to control the graph.
- If you select
Difference
from theSorting
dropdown menu, the syllables will be sorted by the value difference between two groups and additional menus will appear for statistical testing to test whether the differences between groups are significant. - If you select
group
fromGrouping
, the mean of all the sessions within each group will be plotted in the graph. - If you select
SessionName
orSubjectName
, you can select multiple sessions/subjects in theSessions
menu by holding down the [Ctrl]/[Command]/[Shift] key. you can click on the legend items to selectively hide the corresponding data points. - If you have labeled the syllables, you can use specify the syllables you want to plot in the
Syllable to Display
field, such as "run", "walk" etc. The text input is not case-sensitive.
- If you select
- Select a threshold criterion from the "Threshold By" dropdown menu. Use the Thresholding Slider to include syllables with statistics within a specific value range.
- Hover over the circle data points to display a pop-up window with additional syllable metadata.
The notebook contains two sections that display information on syllable transition statistics. The first plots the transition matrix itself
while the second plots a representation of the transition matrix as a directed graph
Transition matrices (TMs) compactly represent the frequency that any syllable transitions into any other syllable. It is one way to describe the average structure in behavior. Transitions between one syllable to another can also be referred to as bigrams. The row of the TM represents an incoming syllable, while the column represents the outgoing syllable. The value at a specific row and column position represents the frequency the incoming syllable transitions into the outgoing syllable or the frequency of the bigram.
TMs can be normalized in three ways:
- bigram normalization: describes the absolute probability a bigram occurs within the dataset
- row normalization: describes the probability that one syllable transitions into another (also known as outgoing probability)
- column normalization: describes the probability that any syllable transitions into a specific syllable (also known as incoming probability)
Transition analyses can help visualize gross changes in the structure of behavior between two experimental groups, especially when visualizing TMs in directed graph format. For example, certain syllables that frequently transition into one set of syllables in one experimental condition might transition into a completely different set in another experimental condition.
Interactive Syllable Transition Graph Tool is for exploring the behavioral transition space of your modeled groups. Find sequences of behavior,e.g. bigrams, at different usage/transition probability ranges, and gain a better understanding of the differences across your modeling groups.
- Run the cell to launch the Interactive Syllable Transition Graphing Tool.
- Use
Graph Layout
dropdown menu to specify the graph layout. - Use the
Threshold Edge Weights
slider to select a range for syllable transition probabilities to display in the graphs. - Use the
Threshold Nodes by Usage
slider to select a range for syllable usages to display in the graphs. - Hover over the nodes to display syllable information and the associated crowd movies.
Nodes outside these thresholds will be hidden.
Fingerprint plots summarize behavior by showing distributions of MoSeq scalar values and MoSeq syllables.
The plots are generated using moseq_df
and mean_df
above.
These plots are useful for getting a gestalt of behavior across sessions, mice, and experimental groups, and can reveal general differences across experimental groups. The rows of each plot contain summary statistics for each session. The left-most plot indicates which experimental group each session is a part of. The four middle columns plot the distributions of scalar information, where larger numbers (brighter colors) indicate greater probability mass. The final right-most column plots syllable usage across all syllables (relabeled by usage or frames).
The n_bins
variables control the number of bins to bin the scalar values and the MoSeq column shows the values by syllables.
If there is no sklearn.preprocessing
object passed into the function, the unscaled raw percentage usages within each bin and each MoSeq syllable will be plotted, as shown below.
If an sklearn.preprocessing
object such as MinMaxScaler()
, the data will be scaled within a session, as shown below.
Home | Changelog | Acquisition | Installation | Analysis Pipeline | Troubleshooting and Tips | Tutorials | Join our slack |
-
- Conda installation
- Docker installation
-
Command-line alternatives
-
Troubleshooting and tips
-
Other resources