Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with LXMERT #6

Open
johntiger1 opened this issue Jun 4, 2020 · 5 comments
Open

Integration with LXMERT #6

johntiger1 opened this issue Jun 4, 2020 · 5 comments

Comments

@johntiger1
Copy link

If I want to use this repo to extract RCNN image features to train LXMERT, how can I do that? Do I just dump the features from

# Show the boxes, labels, and features
pred = instances.to('cpu')
v = Visualizer(im[:, :, :], MetadataCatalog.get("vg"), scale=1.2)
v = v.draw_instance_predictions(pred)
showarray(v.get_image()[:, :, ::-1])
print('instances:\n', instances)
print()
print('boxes:\n', instances.pred_boxes)
print()
print('Shape of features:\n', features.shape)

(from https://github.com/airsplay/py-bottom-up-attention/blob/master/demo/demo_feature_extraction_attr.ipynb)

into a .tsv file?

Btw, what is the difference between with and without attributes? Thanks!

@airsplay
Copy link
Owner

airsplay commented Jun 4, 2020

Yea; It would work well (at least in my test).

But the NMS approach would be the best to use this one:

# Select max scores
max_scores, max_classes = scores.max(1) # R x C --> R
num_objs = boxes.size(0)
boxes = boxes.view(-1, 4)
idxs = torch.arange(num_objs).cuda() * num_bbox_reg_classes + max_classes
max_boxes = boxes[idxs] # Select max boxes according to the max scores.
# Apply NMS
keep = nms(max_boxes, max_scores, nms_thresh)
if topk_per_image >= 0:
keep = keep[:topk_per_image]
boxes, scores = max_boxes[keep], max_scores[keep]

@johntiger1
Copy link
Author

Thank you, I will try the Non Maximal Suppression. But, just curious, does this mean that other SOTA recurrent vision models could be used too in the future? rCNN is now several years old, I was wondering if you experimented with more modern vision models, and perhaps can get better performance

@airsplay
Copy link
Owner

airsplay commented Jun 4, 2020

Hmmm... This code does not provide a training, just the weight converted. from the original CAFFE weight.

You could try this and switch the backbone:
https://github.com/MILVLG/bottom-up-attention.pytorch

@yezhengli-Mr9
Copy link

If I want to use this repo to extract RCNN image features to train LXMERT, how can I do that? Do I just dump the features from

# Show the boxes, labels, and features
pred = instances.to('cpu')
v = Visualizer(im[:, :, :], MetadataCatalog.get("vg"), scale=1.2)
v = v.draw_instance_predictions(pred)
showarray(v.get_image()[:, :, ::-1])
print('instances:\n', instances)
print()
print('boxes:\n', instances.pred_boxes)
print()
print('Shape of features:\n', features.shape)

(from https://github.com/airsplay/py-bottom-up-attention/blob/master/demo/demo_feature_extraction_attr.ipynb)

into a .tsv file?

Btw, what is the difference between with and without attributes? Thanks!

Hi @johntiger1, before I finish coding my project:

How long does it take to extract NLPR2's 107,292 images when LXMERT takes around 5 to 6 hours for the training split and 1 to 2 hours for the valid and test splits?

Would you mind taking a time estimate? Thanks.

@yezhengli-Mr9
Copy link

If I want to use this repo to extract RCNN image features to train LXMERT, how can I do that? Do I just dump the features from

# Show the boxes, labels, and features
pred = instances.to('cpu')
v = Visualizer(im[:, :, :], MetadataCatalog.get("vg"), scale=1.2)
v = v.draw_instance_predictions(pred)
showarray(v.get_image()[:, :, ::-1])
print('instances:\n', instances)
print()
print('boxes:\n', instances.pred_boxes)
print()
print('Shape of features:\n', features.shape)

(from https://github.com/airsplay/py-bottom-up-attention/blob/master/demo/demo_feature_extraction_attr.ipynb)
into a .tsv file?
Btw, what is the difference between with and without attributes? Thanks!

Hi @johntiger1, before I finish coding my project:

How long does it take to extract NLPR2's 107,292 images when LXMERT takes around 5 to 6 hours for the training split and 1 to 2 hours for the valid and test splits?

Would you mind taking a time estimate? Thanks.

Hi @johntiger1 , I get my solution for this question of time estimate and summarize it here. Thanks anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants