You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Base on your matlab code, i have finally finished the c++ implement (with some modifications). But with the configuration define on your code, i.e. 68 landmarks, 6 tree depth, 10 trees per forest, 5 stages, i could hardly achieve 1000 FPS, actually 350~450 FPS only.
I notice that when generating the binary features, the binary tree traverse is quite time consuming, even longer than the global prediction (which is almost vector summation, can be optimized by avx instructions).
Btw, my laptop use cpu I7-3820QM 2.7GHz, all test is running on a single core. It would be grateful if you can share some advice.
The text was updated successfully, but these errors were encountered:
Hi, the binary tree traversing should not be that much time consuming. According to your setups, it merely needs 68 * 6 * 10 * 5 times of comparison. Maybe you can speedup it by modifying something on it.
Yeah, you are right, i don't known what's going on. It's a triple loop for each stage, one for landmarks, one for trees, and one for trees' nodes, 68_10_5=3400 times of comparisons, and it takes 0.3 ms, even when i replace the pixel difference comparison with a random selection (i.e. random select left or right child), it still takes 0.28 ms. On the contrary, the global prediction perform (68_2)_(68*10)=92480 times of double precision addition takes 0.2 ms.
Maybe the inner loop count is too small, too much time consumed in the loop branch?
Hi,
Base on your matlab code, i have finally finished the c++ implement (with some modifications). But with the configuration define on your code, i.e. 68 landmarks, 6 tree depth, 10 trees per forest, 5 stages, i could hardly achieve 1000 FPS, actually 350~450 FPS only.
I notice that when generating the binary features, the binary tree traverse is quite time consuming, even longer than the global prediction (which is almost vector summation, can be optimized by avx instructions).
Btw, my laptop use cpu I7-3820QM 2.7GHz, all test is running on a single core. It would be grateful if you can share some advice.
The text was updated successfully, but these errors were encountered: