-
Hi Cebra team. I'm using the Cebra-behavior model to evaluate the relationship between our Neuropixels recording data and behavior stages. To demonstrate that the separation between the behavioral labeling in the real model does not solely depend on the label identity, I visualized the embeddings from the shuffled model with the shuffled labels and expected no apparent pattern from this visualization. However, I noticed two things that can generate a clear separation between the behavioral labels, even in the shuffled models.
However, the one trained with batch_size = 512 is similar to what the demo shows.
I really appreciate your patience and time in answering these questions. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @Owenxz , thanks for flagging. I focused on checking example 1, and can repro the behavior you report. However, you are right now training the model on binary variables (0 or 1), but using a continuous solver. Just checking, is this intended?. This breaks a few assumptions on how the data is sampled; namely the continuous solver with (Other clarification, is the data splitting function used this one? This is additionally problematic when using an It would be interesting to see if you can repro the behavior also if you use I ran a quick test on the following setup, import cebra
from cebra.datasets import init
import numpy as np
from sklearn.model_selection import train_test_split
from cebra import CEBRA
hippocampus_pos = cebra.datasets.init('rat-hippocampus-single-achilles')
max_iterations = 10000
neural_train, neural_test, label_train, label_test = train_test_split(hippocampus_pos.neural, hippocampus_pos.continuous_index.numpy(), test_size=0.2, random_state=2042, shuffle= False)
cebra_dir3_model_shuffled = CEBRA(model_architecture='offset10-model',
batch_size=1024,
learning_rate=3e-4,
temperature=1,
output_dimension=3,
max_iterations=max_iterations,
distance='cosine',
conditional='time_delta',
device='cuda',
verbose=True,
time_offsets=10)
np.random.seed(999)
shuffled_label = np.random.permutation(label_train[:, 1:])
cebra_dir3_model_shuffled.fit(neural_train, shuffled_label[:, 1].astype(int))
cebra_dir3_shuffled = cebra_dir3_model_shuffled.transform(neural_train)
fig = cebra.plot_embedding_interactive(cebra_dir3_shuffled, embedding_labels=shuffled_label[:, 1], title = "CEBRA-Behavior", cmap = "rainbow")
fig.show() and can repro something similar: However, when looking at the loss, we get: i.e. the loss stays stable at chance level for a long time, but then gets very noisy and drops to the failure mode we see in the embedding space. This behavior is fine and expected to a certain level for discrete labels, because we have a limited number of samples to train on. When we heavily overtrain the model, it is possible to get these overfitting effects. The solution is (1) observe the loss for irregularities (2) do not overtrain the model, pick a training time reasonable for the amount of data you have available (3) train on longer data sequences. To point (3), if you want to check, if you trained on a longer dataset, this collapse behavior you see in the loss will probably happen much later in the training. Does this help clarify your question? Happy to discuss further! |
Beta Was this translation helpful? Give feedback.
Hi @Owenxz , thanks for flagging. I focused on checking example 1, and can repro the behavior you report.
However, you are right now training the model on binary variables (0 or 1), but using a continuous solver. Just checking, is this intended?. This breaks a few assumptions on how the data is sampled; namely the continuous solver with
time_delta
distribution expects the label to be approximately changing according to a Normal distribution (which is not the case in this example).(Other clarification, is the data splitting function used this one? This is additionally problematic when using an
offset-10
model, a split that preserves temporal information is more suitable).It would be inte…