Several popular approaches to 3D vision tasks process multiple views of the input independently with deep neural networks pre-trained on natural images, achieving view permutation invariance through a single round of pooling over all views. We argue that this operation discards important information and leads to subpar global descriptors. In this paper, we propose a group convolutional approach to multiple view aggregation where convolutions are performed over a discrete subgroup of the rotation group, enabling, thus, joint reasoning over all views in an equivariant (instead of invariant) fashion, up to the very last layer. We further develop this idea to operate on smaller discrete homogeneous spaces of the rotation group, where a polar view representation is used to maintain equivariance with only a fraction of the number of input views. We set the new state of the art in several large scale 3D shape retrieval tasks, and show additional applications to panoramic scene classification.
This repository contains demos for our best models on rotated and aligned ModelNet40.
Check the requirements in requirements.txt. Our codebase has been tested on TensorFlow 1.15 but the dependency is commented out in requirements.txt to silence GitHub’s security warnings.
Download the datasets here.
The following commands should
- clone this repo,
- create a virtualenv,
- install the requirements.
git clone https://github.com/daniilidis-group/emvn.git
cd emvn
virtualenv -p python3 env
source env/bin/activate
pip install -r requirements.txt
Change –data and –logdir appropriately.
python3 emvn/train.py \
--data /path/to/m40canon_{}.lmdb \
--logdir /tmp/emvn_m40canon \
--epochs 15 \
--batch-size 12 \
--skip_eval \
--eval_retrieval \
--retrieval_include_same \
--triplet_loss \
--optimizer nesterov \
--lr-decay-mode cos \
--gconv_support 0,8,1,15,12,25,21,19,29,7,11,20,4 \
--gcc 512,512,512 \
--n_homogeneous 20 \
--lr 3e-3 \
--homogeneous_only1st \
--n_fc_before_gconv 1 \
--n_group_elements 60 \
--pretrained
Sample outputs:
[2019-10-28 15:17:08,566:INFO] Loading data... [1028 15:17:08 @format.py:93] Found 2468 entries in m40canon_test.lmdb Classes=40, Views=1 [1028 15:17:09 @format.py:93] Found 9843 entries in m40canon_train.lmdb [1028 15:17:09 @format.py:93] Found 2468 entries in m40canon_test.lmdb Loading 100 inputs from pretrained model... [2019-10-28 15:17:11,027:INFO] Loaded model... [2019-10-28 15:17:11,027:INFO] Loaded model; params=23522920 [2019-10-28 15:17:14,366:INFO] Running on cuda:0 [2019-10-28 15:17:14,368:INFO] Checkpoint /tmp/emvn_m40canon/latest.pth.tar not found; ignoring. [2019-10-28 15:17:14,368:INFO] Epoch: [1/15] Iter [10/820] Loss: 3.4836 Time batch: 0.2294 s; Time GPU 0.1063 s; Time to load: 0.2292 s (...) Iter [820/820] Loss: 0.0081 Time batch: 0.2310 s; Time GPU 0.1035 s; Time to load: 0.2310 s [2019-10-28 16:05:08,857:INFO] Time taken: 190.51 sec... [2019-10-28 16:05:08,858:INFO] Saving latest model [2019-10-28 16:05:09,671:INFO] Total time for this epoch: 191.31975293159485 s Starting evaluation... Iter [10/411] Loss: 0.0165 Acc: 100.0000 (...) Iter [410/411] Loss: 0.0059 Acc: 100.0000 [2019-10-28 16:07:14,435:INFO] Evaluating retrieval... [2019-10-28 16:07:15,352:INFO] Computed pairwise distances between 2466 samples [2019-10-28 16:07:17,191:INFO] acc per class=[100.0, 92.0, 98.0, 85.0, 96.0, 98.0, 100.0, 100.0, 98.0, 95.0, 75.0, 95.0, 98.83720930232558, 95.0, 95.34883720930233, 20.0, 98.0, 100.0, 100.0, 90.0, 100.0, 99.0, 98.0, 81.3953488372093, 100.0, 99.0, 82.0, 95.0, 98.0, 100.0, 98.0, 100.0, 80.0, 97.0, 95.0, 100.0, 90.0, 80.0, 90.0, 100.0] [2019-10-28 16:07:17,192:INFO] combined: 92.61, Acc: 94.61, mAP: 93.36, Loss: 0.2238
Results are within 0.2% of Table 2/Ours-R-12 in the paper.
Change –data and –logdir appropriately.
python3 emvn/train.py \
--data /path/to/m40rot_{}.lmdb \
--logdir /tmp/emvn_m40rot \
--epochs 15 \
--batch-size 6 \
--skip_eval \
--eval_retrieval \
--retrieval_include_same \
--triplet_loss \
--optimizer nesterov \
--lr-decay-mode cos \
--lr 1.5e-3 \
--gconv_support 0,8,1,15,12,25,21,19,29,7,11,20,4 \
--gcc 512,512,512 \
--n_fc_before_gconv 1 \
--n_group_elements 60 \
--pretrained
Sample outputs:
[2019-10-28 15:20:01,355:INFO] Loading data... [1028 15:20:01 @format.py:93] Found 2468 entries in m40rot_test.lmdb Classes=40, Views=1 [1028 15:20:03 @format.py:93] Found 9843 entries in m40rot_train.lmdb [1028 15:20:03 @format.py:93] Found 2468 entries in m40rot_test.lmdb Loading 100 inputs from pretrained model... [2019-10-28 15:20:04,614:INFO] Loaded model... [2019-10-28 15:20:04,622:INFO] Loaded model; params=21687912 [2019-10-28 15:20:08,627:INFO] Running on cuda:0 [2019-10-28 15:20:08,727:INFO] Checkpoint /tmp/emvn_m40rot/latest.pth.tar not found; ignoring. [2019-10-28 15:20:08,728:INFO] Epoch: [1/15] Iter [10/1640] Loss: 4.7577 Time batch: 0.5615 s; Time GPU 0.2113 s; Time to load: 0.5800 s (...) Iter [1640/1640] Loss: 0.0071 Time batch: 0.3312 s; Time GPU 0.1352 s; Time to load: 0.3332 s [2019-10-28 17:58:58,240:INFO] Time taken: 541.46 sec... [2019-10-28 17:58:58,241:INFO] Starting evaluation... Iter [10/822] Loss: 0.0001 Acc: 100.0000 (...) Iter [820/822] Loss: 2.6854 Acc: 66.6667 [2019-10-28 18:00:35,861:INFO] Evaluating retrieval... [2019-10-28 18:00:36,760:INFO] Computed pairwise distances between 2466 samples [2019-10-28 18:00:38,703:INFO] acc per class=[100.0, 86.0, 100.0, 85.0, 97.0, 98.0, 85.0, 99.0, 98.0, 95.0, 75.0, 100.0, 83.72093023255815, 95.0, 82.55813953488372, 45.0, 97.0, 99.0, 100.0, 85.0, 100.0, 98.0, 97.0, 76.74418604651163, 95.0, 98.0, 79.0, 90.0, 88.0, 90.0, 95.0, 100.0, 65.0, 77.0, 90.0, 99.0, 77.0, 84.0, 55.0, 77.77777777777777] [2019-10-28 18:00:38,704:INFO] combined: 87.80, Acc: 90.67, mAP: 88.38, Loss: 0.3442
Results are within 0.4% of Table 3/Ours-R-60 in the paper.
Coming soon!
Coming soon!
Carlos Esteves*, Yinshuang Xu*, Christine Allen-Blanchette, Kostas Daniilidis. “Equivariant Multi-View Networks”. The IEEE International Conference on Computer Vision (ICCV), 2019.
@InProceedings{Esteves_2019_ICCV,
author = {Esteves, Carlos and Xu, Yinshuang and Allen-Blanchette, Christine and Daniilidis, Kostas},
title = {Equivariant Multi-View Networks},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Carlos Esteves*, Yinshuang Xu*, Christine Allen-Blanchette, Kostas Daniilidis