Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: expected a non-empty list of Tensors #95

Closed
youth123 opened this issue May 10, 2020 · 7 comments
Closed

RuntimeError: expected a non-empty list of Tensors #95

youth123 opened this issue May 10, 2020 · 7 comments
Labels
bug Something isn't working fixed in dev branch

Comments

@youth123
Copy link

youth123 commented May 10, 2020

When I followed the cascaded example, I just removed the dataparallel code and met the issue.

INFO:root:TRAINING EPOCH 1
total_loss=43.34527: 2%|▍ | 84/3750 [00:14<10:26, 5.85it/s]Traceback (most recent call last):
File "/root/toyer/cur_ali_img_retrieval_v6/CascadedEmbeddings.py", line 187, in
trainer.train(num_epochs=num_epochs)
File "/root/anaconda3/lib/python3.7/site-packages/pytorch_metric_learning/trainers/base_trainer.py", line 83, in train
self.forward_and_backward()
File "/root/anaconda3/lib/python3.7/site-packages/pytorch_metric_learning/trainers/base_trainer.py", line 110, in forward_and_backward
self.calculate_loss(self.get_batch())
File "/root/anaconda3/lib/python3.7/site-packages/pytorch_metric_learning/trainers/cascaded_embeddings.py", line 26, in calculate_loss
self.losses[curr_loss_name] += self.maybe_get_metric_loss(e, labels, indices_tuple, curr_loss_name)
File "/root/anaconda3/lib/python3.7/site-packages/pytorch_metric_learning/trainers/cascaded_embeddings.py", line 38, in maybe_get_metric_loss
return self.loss_funcs[curr_loss_name](embeddings, labels, indices_tuple)
File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/lib/python3.7/site-packages/pytorch_metric_learning/losses/base_metric_loss_function.py", line 53, in forward
loss = self.compute_loss(embeddings, labels, indices_tuple)
File "/root/anaconda3/lib/python3.7/site-packages/pytorch_metric_learning/losses/circle_loss.py", line 43, in compute_loss
indices_tuple = lmu.convert_to_triplets(indices_tuple, labels, t_per_anchor=self.triplets_per_anchor)
File "/root/anaconda3/lib/python3.7/site-packages/pytorch_metric_learning/utils/loss_and_miner_utils.py", line 204, in convert_to_triplets
return [torch.cat(x, dim=0) for x in [a_out, p_out, n_out]]
File "/root/anaconda3/lib/python3.7/site-packages/pytorch_metric_learning/utils/loss_and_miner_utils.py", line 204, in
return [torch.cat(x, dim=0) for x in [a_out, p_out, n_out]]
RuntimeError: expected a non-empty list of Tensors

@KevinMusgrave KevinMusgrave added the bug Something isn't working label May 10, 2020
@KevinMusgrave
Copy link
Owner

KevinMusgrave commented May 10, 2020

Did you change any other parameters like batch size, dataset, etc? And does the bug occur every time you run it, or does it seem random?

@youth123
Copy link
Author

youth123 commented May 10, 2020

sampler = samplers.MPerClassSampler(train_dataset.targets, m=4, length_before_new_iter=len(train_dataset))
batch_size = 8
num_epochs = 4
tester = testers.GlobalEmbeddingSpaceTester(end_of_testing_hook=hooks.end_of_testing_hook, visualizer=umap.UMAP(), visualizer_hook=visualizer_hook, dataloader_num_workers=32)
end_of_epoch_hook = hooks.end_of_epoch_hook(
    tester, dataset_dict, model_folder, test_interval=1, patience=1
)
trainer = trainers.CascadedEmbeddings(
    models=cascaded_model, optimizers=optimizers, batch_size=batch_size, loss_funcs=loss_funcs, mining_funcs=mining_funcs,
    dataset=train_dataset, sampler=sampler, dataloader_num_workers=8, end_of_iteration_hook=hooks.end_of_iteration_hook,
    end_of_epoch_hook=end_of_epoch_hook, embedding_sizes=[64, 64, 64]
)
trainer.train(num_epochs=num_epochs)

This is where may cause the error and I can see the problem every time .

@KevinMusgrave
Copy link
Owner

KevinMusgrave commented May 10, 2020

Ah, yes I think it's related to the batch size and the MPerClassSampler. I have a few ideas of how to improve / fix this bug. But in the meantime, try m=2 in the MPerClassSampler.

@youth123
Copy link
Author

when I debuged the error, I found that this was caused by the function convert_to_triplets in loss_and_miner_utils.py.

def convert_to_triplets(indices_tuple, labels, t_per_anchor=100):
        .....
        a_out, p_out, n_out = [], [], []
        a1, p, a2, n = indices_tuple
        if len(a1) == 0 or len(a2) == 0:
            return [torch.tensor([]).to(labels.device)] * 3
        for i in range(len(labels)):
            pos_idx = (a1 == i).nonzero().flatten()
            neg_idx = (a2 == i).nonzero().flatten()
            if len(pos_idx) > 0 and len(neg_idx) > 0:
                p_idx = p[pos_idx]
                n_idx = n[neg_idx]
                p_idx, n_idx = matched_size_indices(p_idx, n_idx)
                a_idx = torch.ones_like(c_f.longest_list([p_idx, n_idx])) * i
                a_out.append(a_idx)
                p_out.append(p_idx)
                n_out.append(n_idx)
        return [torch.cat(x, dim=0) for x in [a_out, p_out, n_out]]

when the list a1 is disjoint with the a2, the final result x will be empty to cause the error. I am confused with the meaning of the list a1 and the source of it.

@KevinMusgrave
Copy link
Owner

(a1, p) represents positive pairs, and (a2, n) represents negative pairs. The code is trying to convert pairs to triplets, and in order to do that, it has to find the positive and negative pairs that share a common "anchor". In other words, it needs to find the a1 and a2 that are equal to each other.

The most likely reason why a1 and a2 are disjoint is because your batch size is small, which results in a small number of possible pairs. The small number of possible pairs is reduced even further by the MultiSimilarityMiner and HDCMiner.

In addition to that, if your MPerClassSampler has m = (batch size // 2), then its possible for a batch to continue only 1 label. This is due to a weakness in the implementation of MPerClassSampler.

So there's really 2 things for me to improve here, but I might not get around to it for a couple days. If you need to get your code running soon, I suggest either using a larger batch size, or using miners that are directly compatible with the losses. That means if a loss is expecting triplets, then use a triplet miner. So for example, you could replace HDCMiner with TripletMarginMiner.

@KevinMusgrave
Copy link
Owner

I finally fixed the bug in convert_to_triplets. The fix is in 0.9.87.dev4. It will return empty tensors instead of raising an exception. I haven't worked on MPerClassSampler yet though, and I might create a separate issue for that.

@KevinMusgrave
Copy link
Owner

I made a separate issue for improving MPerClassSampler: #124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed in dev branch
Projects
None yet
Development

No branches or pull requests

2 participants