Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize threads #162

Closed

Conversation

oliver-batchelor
Copy link

@oliver-batchelor oliver-batchelor commented Feb 26, 2022

I've optimized feature extraction and feature matching to make better use of the GPU by reading inputs and writing outputs in threads.

  • Makes feature extractors more fully use the GPU via. threaded output writer
  • Speeds up NN feature matcher (on my machine a factor of 10)
  • Multi-threading for cpu only (SIFT) extractor (relies on adding a scoped_gil_release to pycolmap too)

Comment on lines 207 to 209
def write_predictions(item, fd, as_half=True):
name, pred = item
original_size = pred['image_size']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: all code in the project should be indented with 4 spaces

Comment on lines 238 to 241




Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: only two blank lines between function or class definitions.

Comment on lines 299 to 301
with tqdm(loader) as pbar:
for data in pbar:
process.put(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here tqdm will report the time to enqueue but not to process. Could we instead have tqdm accurately report the processing time? for example using the lock defined in tqdm.contrib.concurrent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - will check that out!

if names_to_pair(*pair) not in skip_pairs]

feature_pairs = FeaturesPairs(pairs, feature_path_q, feature_paths_refs)
loader = torch.utils.data.DataLoader(feature_pairs, num_workers=num_workers, batch_size=batch_size, shuffle=False, pin_memory=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does the code really support batch_size>1? This only works if all images have the same number of keypoints, which is not gauranteeed.
  2. Does batch_size>1 give any performance improvement? With SuperGlue the GPU is usually rather saturated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After benchmarking I see the batch_size does not matter - so I've removed it. Performance is almost entirely IO limited, threads for reading and writing take performance doing NN matching from 3 pairs/sec to 33 images / sec


with tqdm(smoothing=.1, total=len(feature_pairs)) as pbar:
for pairs, data in loader:
data = map_tensor(data, partial(torch.Tensor.to, device=device, dtype=torch.float16))
Copy link
Member

@sarlinpe sarlinpe Feb 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why casting the inputs to fp16? Does this speed up NN matching?
  2. Have you tested this with SuperGlue? Shouldn't the model parameters be also casted to fp16?
  3. Does this degrade the accuracy in any way? My bet is that you need mixed-precision (with torch.autocast) because ops like attention probably require the range of fp32.

Copy link
Author

@oliver-batchelor oliver-batchelor Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check that out - mostly I have been using it with torch.cuda.amp.autocast turned on (outside the matcher.main), so I will check with and without (maybe it should be a an argument ... e.g. use_autocast or the like?)

@sarlinpe
Copy link
Member

Very nice, thank you! I don't have time to test in details for now but I left some high-level preliminary comments.

Comment on lines +202 to +203
pairs = [pair for pair in set(pairs)
if names_to_pair(*pair) not in skip_pairs]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to also

  1. check for existence in skip_pairs for both directions 0->1 and 1->0
  2. Remove duplicated equivalent pairs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ha, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually added this in PR #159, I will merge it in the next days

@sarlinpe
Copy link
Member

sarlinpe commented Oct 8, 2022

Cleaner implementation in #242

@sarlinpe sarlinpe closed this Oct 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants