The package contains a mixture of classic decoding methods and modern machine learning methods.
For regression, we currently include: Wiener Filter, Wiener Cascade, Kalman Filter, Naive Bayes, Support Vector Regression, XGBoost, Dense Neural Network, Recurrent Neural Net, GRU, LSTM.
For classification, we currently include: Logistic Regression, Support Vector Classification, XGBoost, Dense Neural Network, Recurrent Neural Net, GRU, LSTM.
This package was originally designed for regression and classification functions were just added - therefore, the ReadMe, examples, and preprocessing functions are still catered for regression. We are in the process of adding more for classification.
This package accompanies a manuscript that compares the performance of these methods on several datasets. We would appreciate if you cite that manuscript if you use our code or data for your research.
Code used for the paper is in the "Paper_code" folder. It is described further at the bottom of this read-me.
All 3 datasets (motor cortex, somatosensory cortex, and hippocampus) used in the paper can be downloaded here. They are in both matlab and python formats, and can be used in the example files described below.
This package can be installed via pip
at the command line by typing
pip install Neural-Decoding
or manually via
git clone https://github.com/KordingLab/Neural_Decoding.git
cd Neural_Decoding
python setup.py install
You'll have to install each dependency yourself if you install manually. We've designed the code so that not all machine learning packages need to be installed for the others to work.
All packages will be installed automatically when installing from pip
(because of the requirements.txt
file).
If installing manually via python setup.py install
:
In order to run all the decoders based on neural networks, you need to install Keras
In order to run the XGBoost Decoder, you need to install XGBoost
In order to run the Wiener Filter, Wiener Cascade, or Support Vector Regression you will need scikit-learn.
In order to do hyperparameter optimization, you need to install BayesianOptimization
We have included jupyter notebooks that provide detailed examples of how to use the decoders.
- The file
central_concepts_in_ML_for_decoding.ipynb
is designed for users who are new to machine learning. It builds basic concepts and shows some examples, and also has several exercises to make sure you know your stuff. (Link to the solutions is inside). - The file
Examples_kf_decoder.ipynb
is for the Kalman filter decoder - The file
Examples_all_decoders.ipynb
is for all other decoders. These examples work well with the somatosensory and motor cortex datasets. - There are minor differences in the hippocampus dataset, so we have included a folder,
Examples_hippocampus
, with analogous example files. This folder also includes an example file for using the Naive Bayes decoder (since it works much better on our hippocampus dataset). - We have also included a notebook,
Example_hyperparam_opt.ipynb
, that demonstrates how to do hyperparameter optimization for the decoders.
Here we provide a basic example where we are using a LSTM decoder.
For this example we assume we have already loaded matrices:
- "neural_data": a matrix of size "total number of time bins" x "number of neurons," where each entry is the firing rate of a given neuron in a given time bin.
- "y": the output variable that you are decoding (e.g. velocity), and is a matrix of size "total number of time bins" x "number of features you are decoding."
We have provided a Jupyter notebook, Example_format_data.ipynb
with an example of how to get Matlab data into this format.
First we will import the necessary functions
from Neural_Decoding.decoders import LSTMDecoder #Import LSTM decoder
from Neural_Decoding.preprocessing_funcs import get_spikes_with_history #Import function to get the covariate matrix that includes spike history from previous bins
Next, we will define the time period we are using spikes from (relative to the output we are decoding)
bins_before=13 #How many bins of neural data prior to the output are used for decoding
bins_current=1 #Whether to use concurrent time bin of neural data
bins_after=0 #How many bins of neural data after the output are used for decoding
Next, we will compute the covariate matrix that includes the spike history from previous bins
# Function to get the covariate matrix that includes spike history from previous bins
X=get_spikes_with_history(neural_data,bins_before,bins_after,bins_current)
In this basic example, we will ignore some additional preprocessing we do in the example notebooks. Let's assume we have now divided the data into a training set (X_train, y_train) and a testing set (X_test,y_test).
We will now finally train and test the decoder:
#Declare model and set parameters of the model
model_lstm=LSTMDecoder(units=400,num_epochs=5)
#Fit model
model_lstm.fit(X_train,y_train)
#Get predictions
y_test_predicted_lstm=model_lstm.predict(X_test)
There are 3 files with functions. An overview of the functions are below. More details can be found in the comments within the files.
This file provides all of the decoders. Each decoder is a class with functions "fit" and "predict".
First, we will describe the format of data that is necessary for the decoders
- For all the decoders, you will need to decide the time period of spikes (relative to the output) that you are using for decoding.
- For all the decoders other than the Kalman filter, you can set "bins_before" (the number of bins of spikes preceding the output), "bins_current" (whether to use the bin of spikes concurrent with the output), and "bins_after" (the number of bins of spikes after the output). Let "surrounding_bins" = bins_before+bins_current+bins_after. This allows us to get a 3d covariate matrix "X" that has size "total number of time bins" x "surrounding_bins" x "number of neurons." We use this input format for the recurrent neural networks (SimpleRNN, GRU, LSTM). We can also flatten the matrix, so that there is a vector of features for every time bin, to get "X_flat" which is a 2d matrix of size "total number of time bins" x "surrounding_bins x number of neurons." This input format is used for the Wiener Filter, Wiener Cascade, Support Vector Regression, XGBoost, and Dense Neural Net.
- For the Kalman filter, you can set the "lag" - what time bin of the neural data (relative to the output) is used to predict the output. The input format for the Kalman filter is simply the 2d matrix of size "total number of time bins" x "number of neurons," where each entry is the firing rate of a given neuron in a given time bin.
- The output, "y" is a 2d matrix of size "total number of time bins" x "number of output features."
Here are all the decoders within "decoders.py" for performing regression:
- WienerFilterDecoder
- The Wiener Filter is simply multiple linear regression using X_flat as an input.
- It has no input parameters
- WienerCascadeDecoder
- The Wiener Cascade (also known as a linear nonlinear model) fits a linear regression (the Wiener filter) followed by fitting a static nonlearity.
- It has parameter degree (the degree of the polynomial used for the nonlinearity)
- KalmanFilterDecoder
- We used a Kalman filter similar to that implemented in Wu et al. 2003. In the Kalman filter, the measurement was the neural spike trains, and the hidden state was the kinematics.
- We have one parameter C (which is not in the previous implementation). This parameter scales the noise matrix associated with the transition in kinematic states. It effectively allows changing the weight of the new neural evidence in the current update.
- NaiveBayesDecoder
- We used a Naive Bayes decoder similar to that implemented in Zhang et al. 1998 (see manuscript for details).
- It has parameters encoding_model (for either a linear or quadratic encoding model) and res (to set the resolution of predicted values)
- SVRDecoder
- This decoder uses support vector regression using X_flat as an input.
- It has parameters C (the penalty of the error term) and max_iter (the maximum number of iterations).
- It works best when the output ("y") has been normalized
- XGBoostDecoder
- We used the Extreme Gradient Boosting XGBoost algorithm to relate X_flat to the outputs. XGBoost is based on the idea of boosted trees.
- It has parameters max_depth (the maximum depth of the trees), num_round (the number of trees that are fit), eta (the learning rate), and gpu (if you have the gpu version of XGBoost installed, you can select which gpu to use)
- DenseNNDecoder
- Using the Keras library, we created a dense feedforward neural network that uses X_flat to predict the outputs. It can have any number of hidden layers.
- It has parameters units (the number of units in each layer), dropout (the proportion of units that get dropped out), num_epochs (the number of epochs used for training), and verbose (whether to display progress of the fit after each epoch)
- SimpleRNNDecoder
- Using the Keras library, we created a neural network architecture where the spiking input (from matrix X) was fed into a standard recurrent neural network (RNN) with a relu activation. The units from this recurrent layer were fully connected to the output layer.
- It has parameters units, dropout, num_epochs, and verbose
- GRUDecoder
- Using the Keras library, we created a neural network architecture where the spiking input (from matrix X) was fed into a network of gated recurrent units (GRUs; a more sophisticated RNN). The units from this recurrent layer were fully connected to the output layer.
- It has parameters units, dropout, num_epochs, and verbose
- LSTMDecoder
- All methods were the same as for the GRUDecoder, except Long Short Term Memory networks (LSTMs; another more sophisticated RNN) were used rather than GRUs.
- It has parameters units, dropout, num_epochs, and verbose
When designing the XGBoost and neural network decoders, there were many additional parameters that could have been utilized (e.g. regularization). To simplify ease of use, we only included parameters that were sufficient for producing good fits.
The file has functions for metrics to evaluate model fit. It currently has functions to calculate:
The file contains functions for preprocessing data that may be useful for putting the neural activity and outputs in the correct format for our decoding functions
- bin_spikes: converts spike times to the number of spikes within time bins
- bin_output: converts a continuous stream of outputs to the average output within time bins
- get_spikes_with_history: using binned spikes as input, this function creates a covariate matrix of neural data that incorporates spike history
In the folder "Paper_code", we include code used for the manuscript.
- Files starting with "ManyDecoders" use all decoders except the Kalman Filter and Naive Bayes
- Files starting with "KF" use the Kalman filter
- Files starting with "BayesDecoder" use the Naive Bayes decoder
- Files starting with "Plot" create the figures in the paper
- Files ending with "FullData" are for figures 3/4
- Files ending with "DataAmt" are for figures 5/6
- Files ending with "FewNeurons" are for figure 7
- Files ending with "BinSize" are for figure 8
- Files mentioning "Hyperparams" are for figure 9