-
Notifications
You must be signed in to change notification settings - Fork 106
Performance Considerations
Chun Kit LAM edited this page Aug 23, 2023
·
2 revisions
Manifold records the underlying CSG operations and evaluates them lazily. This has several consequences:
- CSG operations are instantaneous, because they don't really perform mesh operations. Benchmarks should also include the time required to run
GetMesh
or other operations that triggers evaluation.
const { sphere, union } = Manifold
const a = sphere(10, 128)
const b = sphere(10, 128).translate([10, 0, 0])
const start = performance.now()
const result = a.add(b)
const endAdd = performance.now()
result.getMesh()
const end = performance.now()
console.log(endAdd - start) // 0
console.log(end - endAdd) // 14
- For best performance, avoid calling functions that will trigger CSG evaluation unless it is really needed, e.g.
GetMesh
,NumVerts
. Manifold can perform optimization on a sequence of CSG operations, including reordering and parallelization. - Try to reuse existing results. In general, CSG operations are much slower than linear transformations (translation, rotation and scaling).
const { cylinder, sphere, union } = Manifold
function foo(deg) {
const r = 5
return sphere(1, 128).translate([r*Math.sin(deg / 180 * Math.PI), -r*Math.cos(deg/180*Math.PI), 0])
.add(sphere(1, 128).translate([-r*Math.sin(deg / 180 * Math.PI), r*Math.cos(deg/180*Math.PI), 0]))
.add(cylinder(2*r, 0.2, 0.2, 128, true).rotate([90, 0, deg]))
}
const foo_cached = foo(0)
function foo_faster(deg) {
return foo_cached.rotate([0, 0, deg])
}
let results = []
for (let i = 0; i < 180; i += 30)
results.push(foo_faster(i)) // foo_faster will be faster than foo
const result = union(results)
- Parallelization is costly, as sending messages between threads can be expensive. We only want to perform parallelization when there is a significant amount of work. In general, we use
autoPolicy(size)
to determine the policy: parallel when the size is large (the threshold is tunable), and sequential when the size is small. However, there is a caveat: this is only meant for simple tasks like copying data, sorting, prefix sum, where each operation takes only a few clock cycles. When the operations are significantly more complicated, you should run some benchmarks and determine the appropriate threshold (not a precise threshold, but a rough cutoff, as this can be machine and workload dependent). - Parallelization can be non-trivial. Parallel implementation of some algorithms may perform additional work to enable parallelization of the costly part. For example, our collider first count the number of collisions, allocate space for each query, and then fill the results. This means that we are actually finding the collisions twice. Benchmark should be performed to check if the overhead is worth it, and when should we switch to a faster sequential implementation.
- Contentions are costly, try to write in a way that avoids contention, for example by using several mutex and hash function.
Current optimizations:
- Intersection/Union reordering: The idea is to compute boolean operation for smaller meshes first, as the complexity of the algorithm is proportional to the number of vertices/edges in the mesh.
- Parallel boolean operations: In addition to reordering, we also perform boolean operations in parallel if possible. Results are pushed into a concurrent priority queue (ordered by number of vertices), and the pair of meshes with the smallest number of vertices will be processed again by the thread pool.
- Compose for non-intersecting meshes. Compose is a fast union valid for non-intersecting meshes, because it is merely copying the data into a combined mesh and there is no attempt at figuring out the triangulation/intersections. We use bounding box to determine whether or not two meshes can intersect, and perform compose for those that cannot intersect.
- Children flattening: For consecutive operations, e.g. a list of unions, the root can flatten the tree into a n-ary union operation. This enables optimizations above and avoids long recursion during evaluation
- Caching and sharing: Instead of a tree, the CSG operations can form a node if users reuse the result of certain operations. The operations will only be evaluated once, and stored in cache to share to other nodes.