Optimize threads #162

oliver-batchelor · 2022-02-26T13:32:46Z

I've optimized feature extraction and feature matching to make better use of the GPU by reading inputs and writing outputs in threads.

Makes feature extractors more fully use the GPU via. threaded output writer
Speeds up NN feature matcher (on my machine a factor of 10)
Multi-threading for cpu only (SIFT) extractor (relies on adding a scoped_gil_release to pycolmap too)

sarlinpe · 2022-02-27T00:47:12Z

hloc/extract_features.py

+def write_predictions(item, fd, as_half=True):
+  name, pred = item
+  original_size = pred['image_size']


Style: all code in the project should be indented with 4 spaces

sarlinpe · 2022-02-27T00:47:56Z

hloc/extract_features.py

+
+
+
+


Style: only two blank lines between function or class definitions.

sarlinpe · 2022-02-27T00:52:36Z

hloc/extract_features.py

+      with tqdm(loader) as pbar:
+        for data in pbar:
+          process.put(data)


Here tqdm will report the time to enqueue but not to process. Could we instead have tqdm accurately report the processing time? for example using the lock defined in tqdm.contrib.concurrent.

Thanks - will check that out!

sarlinpe · 2022-02-27T00:57:52Z

hloc/match_features.py

+      if names_to_pair(*pair) not in skip_pairs]
+
+    feature_pairs = FeaturesPairs(pairs, feature_path_q, feature_paths_refs)
+    loader = torch.utils.data.DataLoader(feature_pairs, num_workers=num_workers, batch_size=batch_size, shuffle=False, pin_memory=True)


Does the code really support batch_size>1? This only works if all images have the same number of keypoints, which is not gauranteeed.

Does batch_size>1 give any performance improvement? With SuperGlue the GPU is usually rather saturated.

After benchmarking I see the batch_size does not matter - so I've removed it. Performance is almost entirely IO limited, threads for reading and writing take performance doing NN matching from 3 pairs/sec to 33 images / sec

sarlinpe · 2022-02-27T00:58:36Z

hloc/match_features.py

+
+      with tqdm(smoothing=.1, total=len(feature_pairs)) as pbar:
+        for pairs, data in loader:
+            data = map_tensor(data, partial(torch.Tensor.to, device=device, dtype=torch.float16))


Why casting the inputs to fp16? Does this speed up NN matching?

Have you tested this with SuperGlue? Shouldn't the model parameters be also casted to fp16?

Does this degrade the accuracy in any way? My bet is that you need mixed-precision (with torch.autocast) because ops like attention probably require the range of fp32.

I will check that out - mostly I have been using it with torch.cuda.amp.autocast turned on (outside the matcher.main), so I will check with and without (maybe it should be a an argument ... e.g. use_autocast or the like?)

sarlinpe · 2022-02-27T00:59:32Z

Very nice, thank you! I don't have time to test in details for now but I left some high-level preliminary comments.

sarlinpe · 2022-02-27T01:04:26Z

hloc/match_features.py

+    pairs = [pair for pair in set(pairs) 
+      if names_to_pair(*pair) not in skip_pairs]


Here we need to also

check for existence in skip_pairs for both directions 0->1 and 1->0

Remove duplicated equivalent pairs

Ah ha, thanks.

I actually added this in PR #159, I will merge it in the next days

…eanup

sarlinpe · 2022-10-08T14:15:10Z

Cleaner implementation in #242

oliver-batchelor-work added 3 commits February 21, 2022 14:49

Optimize extraction (work queue for writing outputs)

ffbdbbf

Merge branch 'master' of github.com:cvg/Hierarchical-Localization

9d97040

Optimize feature extraction and matching pipelines with threads

4d04a56

sarlinpe requested changes Feb 27, 2022

View reviewed changes

sarlinpe reviewed Feb 27, 2022

View reviewed changes

oliver-batchelor-work added 2 commits March 2, 2022 17:15

Remove batch_size in matcher, progress bar reflects finished jobs, cl…

4682620

…eanup

Show number, name and size of image extracted in prog bar

4837c52

sarlinpe mentioned this pull request Aug 26, 2022

Question about running time #224

Closed

sarlinpe closed this Oct 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize threads #162

Optimize threads #162

oliver-batchelor commented Feb 26, 2022 •

edited

Loading

sarlinpe Feb 27, 2022

sarlinpe Feb 27, 2022

sarlinpe Feb 27, 2022

oliver-batchelor Mar 1, 2022

sarlinpe Feb 27, 2022

oliver-batchelor Mar 2, 2022

sarlinpe Feb 27, 2022 •

edited

Loading

oliver-batchelor Mar 1, 2022 •

edited

Loading

sarlinpe commented Feb 27, 2022

sarlinpe Feb 27, 2022

oliver-batchelor Mar 1, 2022

sarlinpe Mar 2, 2022

sarlinpe commented Oct 8, 2022

		pairs = [pair for pair in set(pairs)
		if names_to_pair(*pair) not in skip_pairs]

Optimize threads #162

Optimize threads #162

Conversation

oliver-batchelor commented Feb 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarlinpe Feb 27, 2022 • edited Loading

Choose a reason for hiding this comment

oliver-batchelor Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

sarlinpe commented Feb 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarlinpe commented Oct 8, 2022

oliver-batchelor commented Feb 26, 2022 •

edited

Loading

sarlinpe Feb 27, 2022 •

edited

Loading

oliver-batchelor Mar 1, 2022 •

edited

Loading