Skip to content

base implementation of the RoI pooling strategy described in the paper "Feature Selective Networks for Object Detection"

Notifications You must be signed in to change notification settings

xychen9459/feature_selective_networks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

feature_selective_networks

base implementation of the RoI pooling strategy described in the paper "Feature Selective Networks for Object Detection". I have no idea if this implementation actually works, but it sounded useful and I couldn't find an implementation of that specific paper.

This is used in a two-stage object detection framework where you propose a list of highest-objectness-confidence regions, then pool them together (often called RoIPool or RoIAlign) into one tensor so you can continue feeding them through the model to get output predictions. The way that pooling works is you have a list of coordinates and a list of feature maps, and you take features from the specified coordinates in the most appropriate scale feature maps (depending on the size of each proposed region) and resize them so you can get one output tensor.

The idea in the Feature Selective Networks paper, is you lose some scale and position information from that pool, so at the same time you pool the regions you also pool features from two parallel feature maps (generated with convolutional layers with learnable features) designed to capture spatial and scale features. The scale branch uses one conv for each scale and selectively takes features from an output corresponding to the closest aspect ratio of each proposed region. The subregion branch does something similar for each proposed region's absolute location within the original input image (by generating all 9 maps and choosing the map appropriate to the proposed region's location in the image), but also instead of regular convs it uses a "shifted conv" that looks in the desired direction (by shifting the feature map 1 in the opposite direction); for a 3x3 grid that's 9 different directional shifts shifts; (the shifted convolutional layer is a subcase of another idea called a deformable conv, which learns a variable shift).

Anyway, the reason all that sounded useful was the authors reported they were able to get similar or improved results with fewer output channels after pooling the regions of interest. I was trying a two-branch model with 512 output channels after a RoIAlign, and being able to get that down to 40 sounded like something that would speed training up a little.

No idea how well this actually works (because I've only tried it on small-ass toy datasets, come on do you really think I'm going to try to experiment on something that takes like a week to train each time and doesn't even get results for who knows what plethora of reasons), but it sounded useful and I couldn't find an implementation. So I would like to at least advertise that this sounded like something that could speed up training in certain use cases.

About

base implementation of the RoI pooling strategy described in the paper "Feature Selective Networks for Object Detection"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages