-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #40
Comments
More information can be found in: #37 |
It might be worth taking a look at https://github.com/nushoin/RTree to see if it could be of use for this purpose - I ended up using a slightly modded version of the https://github.com/DevHwan/RTree version for some work I did a couple years back, and it did OK (although I didn't hammer it with the sort of benchmarks you've got there, so I can't say definitely how it would compare.) I was using poly2tri for the CDT, but I did do quite a lot of data management and lookups using that RTree and it seemed to work fairly well. |
Thanks for the suggestions @starseeker. I wander how |
I got a little bit of time to do the profiling. From the data it seems that Original interactive flame graph as ZIP. Benchmark Code
#include "CDT.h"
typedef double CoordType;
typedef CDT::Triangulation<CoordType> Triangulation;
typedef CDT::V2d<CoordType> V2d;
typedef CDT::Vertex<CoordType> Vertex;
typedef CDT::Triangle Triangle;
typedef CDT::Box2d<CoordType> Box2d;
typedef CDT::Edge Edge;
#include <chrono>
#include <iostream>
int main(int argc, char* argv[])
{
const size_t N = 1000;
std::vector<V2d> vertices;
vertices.reserve(N * N);
for(size_t i = 0; i < N; ++i)
{
for(size_t j = 0; j < N; ++j)
{
vertices.push_back(V2d::make(i * 1e-5, j * 1e-5));
}
}
using std::chrono::duration;
using std::chrono::duration_cast;
using std::chrono::high_resolution_clock;
using std::chrono::milliseconds;
auto t1 = high_resolution_clock::now();
Triangulation triangulation(CDT::FindingClosestPoint::BoostRTree);
triangulation.insertVertices(vertices);
auto t2 = high_resolution_clock::now();
/* Getting number of milliseconds as an integer. */
auto ms_int = duration_cast<milliseconds>(t2 - t1);
/* Getting number of milliseconds as a double. */
duration<double, std::milli> ms_double = t2 - t1;
std::cout << ms_int.count() << "ms\n";
return 0;
} |
I should of been more descriptive, you are correct that is flip needed is a larger issue when using a "quad" grid. Try profiling using random numbers and you will see the one that I posted earlier. What I expect is happening (a total guess) in this case is that when you have 4 points on a circle the Delaunay triangulation always flip the triangulation when something happens to any of the neighbour triangles. Since this entire mesh is made of 4 point Delaunay edge cases then there is a lot of flipping on every input. This would not happen in a random mesh. |
I have also been implementing a simple KD-Tree and linking to CDT to see if my expectations were correct. From first benchmarking it seems like my KD-Tree for now was more efficient by a large margin but I don't remember the exact numbers and not really tested so far. I might have time this weekend to continue working on it and will update you with the numbers when I have them. |
@AndreFecteau I will post the flame graph and full benchmark later for documentation. Vertices placed on a regular grid is an interesting but exotic corner case. Random vertices better reflect the common usage of triangulation. So thank you very much for your help with it. I am looking forward for the results. |
Attached you can find the comparison of the 2 implementations. The one with the KDTree (CDT) and the Boost:RTree (CDT2). Theres still a lot of work left before I can push the KDTree to a git repo so it can be added to this project, but it's definitely a good proof of concept for now. Note: this flame snippet is missing the erase super-triangle section of the code that is considerable. |
Here's the random benchmark data. Benchmark Code
#include "CDT.h"
typedef double CoordType;
typedef CDT::Triangulation<CoordType> Triangulation;
typedef CDT::V2d<CoordType> V2d;
typedef CDT::Vertex<CoordType> Vertex;
typedef CDT::Triangle Triangle;
typedef CDT::Box2d<CoordType> Box2d;
typedef CDT::Edge Edge;
#include <chrono>
#include <iostream>
int main(int argc, char* argv[])
{
const size_t N = 500;
std::vector<V2d> vertices;
vertices.reserve(N * N);
for(size_t i = 0; i < N; ++i)
{
for(size_t j = 0; j < N; ++j)
{
bool inserted = false;
while(!inserted)
{
size_t x = (rand() % 10000);
size_t y = (rand() % 10000);
auto it = std::find_if(
vertices.begin(), vertices.end(), [&](const auto& xy) {
return x == xy.x && y == xy.y;
});
if(it == vertices.end())
{
vertices.push_back(V2d::make(x * 1e-5, y * 1e-5));
inserted = true;
}
}
}
}
using std::chrono::duration;
using std::chrono::duration_cast;
using std::chrono::high_resolution_clock;
using std::chrono::milliseconds;
auto t1 = high_resolution_clock::now();
Triangulation triangulation(CDT::FindingClosestPoint::BoostRTree);
triangulation.insertVertices(vertices);
auto t2 = high_resolution_clock::now();
/* Getting number of milliseconds as an integer. */
auto ms_int = duration_cast<milliseconds>(t2 - t1);
/* Getting number of milliseconds as a double. */
duration<double, std::milli> ms_double = t2 - t1;
std::cout << ms_int.count() << "ms\n";
return 0;
}
|
Interesting info: how triangle location is implemented in CGAL |
From memory, the timings that I show in the CGAL benchmarks is using the the range inserter for vertices. I would be interesting to see if inserting points one at a time would amount to the same efficiency. I believe that there are much faster algorithms for Delaunay triangulation when you are able to sort the data points initially. |
Note on shuffling vertices and performanceCDT does not perform vertex shuffling which other implementations might do by default. For example a benchmark with vertices on a grid gets at least 2x faster when vertices are shuffled: std::random_device rd;
std::mt19937 g(rd());
std::shuffle(vertices.begin(), vertices.end(), g); // shuffle vertices On some of my inputs (very long and narrow strip with very dense points - looks like a road) I observed up to 300x speed-ups 🚀 😱 Currently users need to shuffle vertices themselves if it improves performance for their inputs. I am considering including shuffling into the library and make it on by default with a possibility to opt-out. Still thinking about a good and flexible way to implement this. |
I am almost done implementing the KDTree to incorporate in CDT (If you want to) and I have a few questions for you (@artem-ogre ). The most up to date branch is https://github.com/AndreFecteau/CDT/tree/OptimizationByUsingKDTree. It is definitely not complete but I want to know: Are you ok with this being integrated in CDT? |
@AndreFecteau Neat, thanks for the nice effort! I had a quick look. I myself also came to the conclusion that point locator should be a customization point for I am working on randomized vertex insertion right now and have a PoC. The change I've done requires re-factoring 'random-nearest' point locator. So I think it is a good opportunity to implement custom point-locator support. Then when we have an interface in place you could do minimal changes to make it work with your KDTree implementation. And we could take it forward from there!
Yes, but we would need to do the code review and re-write. There are likely going to be changes to be done: conventions, naming, formatting, supporting pre-c++11 standard, etc. Given my limited availability, it will take time to re-factor and merge. It is good to have it in the fork for the time being so that it is immediately usable for those who need it (and not constrained by ancient c++ standard 😃 ). And let's try to move the fork upstream when we can.
I will re-implement current locators so we'll have a reference implementation. I like having it in separate files but they don't have to be 'external', could go right into 'include' folder. Something to think about still.
Not sure I understand this question.
Yes, totally it can be removed. It could also be left there just in case, but if other approach is clearly superior I don't see why to keep boost rtree. |
@artem-ogre There is no real rush on my end to get this in, obviously I am using this software and better efficiency is always nice but it is not crucial.
I agree that having the point locator as a template parameter will help in the future test out different libraries, I would not be surprised that there are faster ones out there for this specific purpose.
Let me know what conventions need to be supported and I can change it. I hope I don't loose smart pointers but if I have to I can remove them.
I have not and I think that it would be worthwhile, unfortunately at my current employment I have written multiple quad-trees and I'm trying to avoid having too similar work on personal projects. |
- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code
- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code
- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code - Compatibility with c++98 standard
- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code - Compatibility with c++98 standard
- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code - Compatibility with c++98 standard
- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - Switch index types used in CDT (e.g., vertices, triangles) from 64-bit to 32-bit - 64-bit index types can be enabled with CDT_USE_64_BIT_INDEX_TYPE - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code - Compatibility with c++98 standard
First, thank you for your kd-tree implementation! It was easy to use after I extracted locator into a template parameter. And the code was easy to follow. 👍 I took it as a starting point and ended up re-writing most of the code, but the ideas and algorithms are the same. Performance is also about the same. Given this situation how would you like to be attributed? I will add a mention in contributors list in README. We can also make it so that you author kd-tree code (maybe by doing a PR) so that your user-name shows up in contributors list auto-generated by github. Could you maybe take a look at the PR #45?
|
The PR also contains a feature of randomizing points insertion. For some datasets it can provides huge performance gains. |
@artem-ogre I took a quick look at the new KDTree that you implemented and I can't claim credit for that. Not much of my implementation is left (Not to worry, I can definitely see the benefits of having this integrated). Just a mention in the readme would be great. I'll probably take a deeper dive this weekend since I'm always curious on other ways of programming stuff. Few things that I'm curious about. By using an std::vector to stored the leafs there is definitely some speedup to have by having contiguous memory and other algorithms that don't have to do a leaf pointer call every time. Did you see any improvement by switching to this method or was it kind of hidden in the rest of the optimizations? Are you not afraid that by using a std::vector that for extremely large models, std::vector may have to reallocate more space a significant amount of times (which is a move at best, deep copy in c++03) and technically (as far as I can imagine and I am not sure of this) double the memory requirement while reallocating. |
This was my hope as well, but I could not see a noticeable speed improvement with the vector. I still like the simplicity and how easy it is to copy compared to pointers.
Yes, depending on the vector implementation it could consume 1.5x-2x more memory than needed for storing it's elements. This is a trade-off that sounds scary in theory but is almost always not a problem in practice (in my experience). |
@AndreFecteau I'm trying to figure out how to add you as a reviewer to the PR. It looks like you should be the collaborator for this to work (I've sent an invite) |
Make initial stack size and stack increment when re-sizing template parameters Performance is the same
Andre Fecteau contributed original KD-tree implementation: https://github.com/AndreFecteau/CDT/tree/OptimizationByUsingKDTree It was heavily modified and included into CDT source.
Add Andre Fecteau to contributors list
Add Andre Fecteau to contributors list
- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - Switch index types used in CDT (e.g., vertices, triangles) from 64-bit to 32-bit - 64-bit index types can be enabled with CDT_USE_64_BIT_INDEX_TYPE - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code - Compatibility with c++98 standard
Make initial stack size and stack increment when re-sizing template parameters Performance is the same
Andre Fecteau contributed original KD-tree implementation: https://github.com/AndreFecteau/CDT/tree/OptimizationByUsingKDTree It was heavily modified and included into CDT source.
Add Andre Fecteau to contributors list
@AndreFecteau I closed this issue. Thank you for all the help! :) |
@artem-ogre These numbers are only meant to be a rough measurement. Note that not all these libraries are reliable or support general constrained Delaunay triangulation.
10x10
Elapsed time poly2Tri: 0.00012235 s
Elapsed time CDT: 0.000653747 s
Elapsed time CGAL: 0.000221772 s
Elapsed time Delaunator: 6.9119e-05 s
Elapsed time Triangle: 0.000959387 s
20x20
Elapsed time poly2Tri: 0.000755585 s
Elapsed time CDT: 0.00217176 s
Elapsed time CGAL: 0.000819186 s
Elapsed time Delaunator: 0.000294325 s
Elapsed time Triangle: 0.00246849 s
100x100
Elapsed time poly2Tri: 0.0116399 s
Elapsed time CDT: 0.113256 s
Elapsed time CGAL: 0.0159775 s
Elapsed time Delaunator: 0.00511762 s
Elapsed time Triangle: 0.0100751 s
100x1000
Elapsed time poly2Tri: 0.106778 s
Elapsed time CDT: 8.49504 s
Elapsed time CGAL: 0.158438 s
Elapsed time Delaunator: 0.0808353 s
Elapsed time Triangle: 0.127281 s
1000x1000
Elapsed time poly2Tri: 1.1229 s
Elapsed time CDT: 84.6357 s
Elapsed time CGAL: 1.6105 s
Elapsed time Delaunator: 0.964066 s
Elapsed time Triangle: 1.69101 s
10x10
Elapsed time poly2Tri: 0.000154201 s
Elapsed time CDT: 0.000439808 s
Elapsed time CGAL: 9.577e-05 s
Elapsed time Delaunator: 6.046e-05 s
Elapsed time Triangle: 0.000652088 s
100x100
Elapsed time poly2Tri: 0.0123536 s
Elapsed time CDT: 0.0613802 s
Elapsed time CGAL: 0.00735314 s
Elapsed time Delaunator: 0.00525829 s
Elapsed time Triangle: 0.0123408 s
100x1000
Elapsed time poly2Tri: 0.168798 s
Elapsed time CDT: 1.06543 s
Elapsed time CGAL: 0.0847132 s
Elapsed time Delaunator: 0.0803343 s
Elapsed time Triangle: 0.191198 s
500x500
Elapsed time poly2Tri: 0.448719 s
Elapsed time CDT: 3.0024 s
Elapsed time CGAL: 0.213624 s
Elapsed time Delaunator: 0.238802 s
Elapsed time Triangle: 0.560273 s
Originally posted by @AndreFecteau in #37 (comment)
The text was updated successfully, but these errors were encountered: