Optimisation for high-load input #125

Korinin38 · 2023-11-09T10:09:40Z

This fork is still a bit rough, but is actively developed as of now. There are some important changes, and more strict STL dependencies, so I suggest leave this PR unmerged and instead be used as a link for those who are interested.

Motivation

Big scale projects (photo-realistic objects with >1M faces) can be processed with xatlas, with a caveat of either significant time cost (if using BruteForce packing method) or sub-optimal result (if using Random packing method). The purpose of this fork is to speed up the time of computation of BruteForce while retaining the packing efficiency, or combine two methods to get optimal result in a satisfying time.

To easily compare productivity of old and new versions, see https://github.com/Korinin38/xatlas-comparison.

Additions

Computing charts

No changes.

Packing charts

Speedup of packing with BruteForce by using parallel computations and Coarse-to-Fine scheme
Option to rotate charts by 90/180/270 degrees
GPU implementation of packing (WIP)
Change of Random using BruteForce method by other criteria (which slows it down, but considerably improves quality)

Changes

Temporary fix of Packing UV meshes results are invalid #116: 'zero area' (actually 'area < ε') charts are not ignored and processed as usual.
Reason: If texelsPerUnit or area captured are large enough, it may lead to undesired result.
PackOptions::rotateCharts is renamed to PackOptions::transposeCharts (which is more accurate description of its purpose); PackOptions::rotateCharts is now used for actual rotation.

BREAKING CHANGE: does not ignore close-to-zero area faces. It was making holes in dense UVMeshes. To preserve functionality, a workaround is needed.

With constant number of steps in scheme and rate.

Also: metric experiments, modifying parallelization for more consistent results. BREAKING CHANGE: PackOptions now has additional options, may break C code.

Better results (increased quality) Also: Add different callback which outputs percentage more frequently and more accurate. PackOptions are renamed.

rough implementation of two other overlap checking strategies: image compression and 2d segment tree

Scheme now uses different reductions of chart for different offsets, making it "true" rasterisation. Changed atlas rasterisation to optimistic for faster results

Instead of reducing full image, we now use image from previous layer.

separate checks for pessimistic and optimistic options

when rate is > 4, reduceToOptimistic should work faster (not tested; hidden inside feature flag)

Not only works faster, but also saves memory.

preparation for adding option to actually "rotate" charts

Update comments for PackOptions BREAKING CHANGE: PackOptions::rotateCharts is now responsible for an actual chart rotation

…inates

New option to force preserving fractional part of input texture coordinates

GPU implementation (currently not supported fully) skipSpeedup, usePreviousPositionOffset, gridSpeedup - new options. BREAKING CHANGE: new pack options; coarseLevelRate is now hidden from user (its function is now replaced by XA_PACKING_COARSE_RATE[...] macros family.

# Conflicts: # source/xatlas/xatlas.cpp # source/xatlas/xatlas.h

…wer of 2

…strategy 'random' now uses 'bruteForce' packing if chart is big enough (reduces speed, improves packing) Previous behaviour is supported through 'PackOptions::randomUseBruteForce'.

…ctionalPart = true`

lcc815 · 2024-01-18T08:44:45Z

Hi, @Korinin38

Thanks for your great work. But when I tried your repo, I cannot build successfully and get following error:

xatlas_static.make:182: recipe for target 'obj/x86_64/Release/xatlas_static/xatlas.o' failed
make[1]: *** [obj/x86_64/Release/xatlas_static/xatlas.o] Error 1
Makefile:120: recipe for target 'xatlas_static' failed
make: *** [xatlas_static] Error 2

Did I do any thing wrong? Any help is appreciated.

Korinin38 · 2024-01-20T20:47:19Z

Hi, @lcc815

Have you tried building current version of xatlas without these contributions? If yes, is this error unique to xatlas_hi-res_optimised?

lcc815 · 2024-01-22T02:04:10Z

Hi, @lcc815

Have you tried building current version of xatlas without these contributions? If yes, is this error unique to xatlas_hi-res_optimised?

Yes! I can build current version of xatlas without these contributions successfully.

lcc815 · 2024-01-24T07:31:42Z

@Korinin38 hello, could you please try solving this? please.. Current version of xatlas is just too slow to use.

Korinin38 · 2024-01-24T12:51:41Z

Hi, @lcc815, sorry for keeping you waiting,

I examined the problem. Seems like the error appears due to addition of exception handling, it would be fixed soon. For immediate result, try changing the parameter exceptionhandling from "Off" to "On" in both "xatlas" and "xatlas_static" projects in premake5.lua, or use the patch that does just that.

Please contact me whether that helped or not.

also: address warnings add openmp options in premake file remove exception handling fix test of gazebo.obj

lcc815 · 2024-01-25T11:09:25Z

@Korinin38 yes! It works! I can build and run your code successfully just like the official code.
However, the time cost is also as the same as the time cost of running the official code : )
What I do is running these:
./build/gmake/bin/x86_64/Release/example my_mesh.obj
I believe I did some thing wrong, could you please tell me how to use your code to speed up this wrapping process.
Again, thanks for your reply!

Korinin38 · 2024-01-25T14:52:11Z

I'll see into it. Can you provide logs for both launches? It would help me greatly.

Bear in mind that there are two independent steps: computing charts and packing charts, and if the slowest step is the former, I can do nothing about it as it is not changed in any way.

lcc815 · 2024-01-26T02:54:37Z

Aha, I got it. The bottleneck is computing charts. packing charts do cost less time than the official code.
Thanks a lot!

siliconvoodoo · 2024-11-05T08:27:54Z

I pulled your branch and did 3 tests with a city block model.

original xatlas: 73 seconds
korinin XA_USE_GPU off CUDA_SUPPORT off openMP unset: 169 seconds
korinin XA_USE_GPU off CUDA_SUPPORT off openMP on: didn't finish after 30 minutes. force stopped.

I didn't try CUDA because of build setup complexities.

Korinin38 · 2024-11-05T12:54:55Z

Hello @siliconvoodoo, thank you for reaching out,

Is it possible for you to provide the example model? If not, could you share details such as number of vertices & faces, as well as number of charts parametrized with xatlas? It's also useful to know which PackOptions you used, and whether the results of packing are satisfactory in both finished tests.

siliconvoodoo · 2024-11-07T11:25:13Z

It's a model with 52 meshes. 4,274,763 vertices, 7.5M triangles.
With original xatlas the viewer gives these stats

But today I tried to build with OpenCL.
That was very hard because your branch doesn't include the dependencies. So I found them on your other repository xatlas-comparison. Went and installed Cuda tookit to have the libopencl.

Went to cmake your 3 libraries (clew/gpu/misc)
Manually added includes and link dependencies (just a note this PR won't get accepted because it breaks the standalone principle of xatlas (only 1 .h and .cpp))

you miss some template exports:
template OpenCLKernelArg::OpenCLKernelArg(const gpu::shared_device_buffer_typed<unsigned __int64>& arg);

the first blit.cl I tried had old content and didn't build.
activating printLog = true, showed:

Device 1
Program build log:
:16:7: error: use of undeclared identifier 'uint32_t'
for (uint32_t y = 0; y < ch; y++) {
^
...

fixed that for nothing because then, there was no aggregateResults entry point.

found the more recent blit.cl,
then it was this error:

:304:32: error: call to 'max' is ambiguous
const unsigned int extentX = max(w, offset_x + chartSizes[0]);
^~~
cl_kernel.h:3498:22: note: candidate function
int OVERLOADABLE max(int, int);
^
cl_kernel.h:3499:23: note: candidate function
uint OVERLOADABLE max(uint, uint);

I fixed with casts.

then the execution failed because the call site missed arguments:
kernels["blitLevel"].exec missing w, h, so I added them and retried

But then execution failed at a further point, with this log:

{Ptr=0x0000000024f730a0 "Kernel aggregateResults: CL_UNKNOWN_ERROR_CODE-9999 (-9999) at line 368" }

Different exception from in viewer project:

{_Ptr=0x000001ec1a3f1bc0 "Global work_size[0] value is zero!" }

I only got it to work on a cornel box model or a model with two cubes.

Korinin38 and others added 30 commits August 8, 2023 11:20

feat!: add simple parallelization

dd0db5e

BREAKING CHANGE: does not ignore close-to-zero area faces. It was making holes in dense UVMeshes. To preserve functionality, a workaround is needed.

feat: coarse-to-fine scheme

6e88082

With constant number of steps in scheme and rate.

feat!: Code optimisation, configurable rate and steps.

a6b8f4b

Also: metric experiments, modifying parallelization for more consistent results. BREAKING CHANGE: PackOptions now has additional options, may break C code.

feat: Change coarse-to-fine application paradigm

01cba03

Better results (increased quality) Also: Add different callback which outputs percentage more frequently and more accurate. PackOptions are renamed.

feat: add alternative overlap checking

120b8dc

rough implementation of two other overlap checking strategies: image compression and 2d segment tree

feat: change coarse-to-fine scheme to produce consistent results

595a38d

Scheme now uses different reductions of chart for different offsets, making it "true" rasterisation. Changed atlas rasterisation to optimistic for faster results

refactor: CoarsePyramid::init and Atlas::addChart now works faster

ff8fdd1

Instead of reducing full image, we now use image from previous layer.

refactor: BitImage::reduceTo now works faster

fd54f5e

separate checks for pessimistic and optimistic options

feat: BitImage::reduceTo now works faster (?)

8752ca6

when rate is > 4, reduceToOptimistic should work faster (not tested; hidden inside feature flag)

refactor: cleanup

2afbd3a

feat: add hard limit of reduction in CoarsePyramid

1f5ed36

Not only works faster, but also saves memory.

refactor: cleanup

463ff0d

fix: memory leaks

0e03ff2

style: spaces converted to tabs

a18709d

refactor!: rename PackOption "rotateCharts" to "transposeCharts"

03780ba

preparation for adding option to actually "rotate" charts

feat!: add charts rotation

a03253c

Update comments for PackOptions BREAKING CHANGE: PackOptions::rotateCharts is now responsible for an actual chart rotation

style: add TODO for breaking change introduced in dd0db5e

eee5cc0

style: brackets consistency

d378219

fix: add missing files

1d08420

fix: endPosition not updating

ccfc2d1

minor bug-fix: fix exceptions handling in OpenMP parallel+for sections

e4d7a52

new option to force preserving fractional part of input texture coord…

8cc61a7

…inates

Merge pull request #1 from PolarNick239/master

3bcfbd8

New option to force preserving fractional part of input texture coordinates

Merge remote-tracking branch 'origin/master'

e49a692

# Conflicts: # source/xatlas/xatlas.cpp # source/xatlas/xatlas.h

fix: cleanup

2cb2a1d

fix: integrate parallel_tools.h into xatlas.cpp

2d4487c

fix: cleanup

1854c10

fix: cleanup

f06d1ea

Merge remote-tracking branch 'origin/master'

119a76b

Korinin38 added 6 commits October 20, 2023 16:34

fix: exception handling in OpenMP

378c3c5

feat: Separate processing of case when XA_PACKING_COARSE_RATE is a po…

017b757

…wer of 2

feat: Update 'random' packing for rotation support, optimize filling …

2aeaf05

…strategy 'random' now uses 'bruteForce' packing if chart is big enough (reduces speed, improves packing) Previous behaviour is supported through 'PackOptions::randomUseBruteForce'.

fix: small tweaks for MSVS compiler

b1ad6ea

fix: remove obsolete comment and debug code

4ddcc67

fix: padding was ignored with `PackOptions::preserveInputTexcoordsFra…

0370f5c

…ctionalPart = true`

fix (build): add openmp options in premake file

ae9f339

also: address warnings add openmp options in premake file remove exception handling fix test of gazebo.obj

Korinin38 added 3 commits May 27, 2024 11:41

fix (packing): undefined max resolution support

62d4bde

fix (packing): incorrect debug assertion

586762c

feat (packing): add option to disable OpenMP usage

67f364c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimisation for high-load input #125

Optimisation for high-load input #125

Korinin38 commented Nov 9, 2023 •

edited

Loading

lcc815 commented Jan 18, 2024

Korinin38 commented Jan 20, 2024

lcc815 commented Jan 22, 2024

lcc815 commented Jan 24, 2024

Korinin38 commented Jan 24, 2024 •

edited

Loading

lcc815 commented Jan 25, 2024 •

edited

Loading

Korinin38 commented Jan 25, 2024

lcc815 commented Jan 26, 2024

siliconvoodoo commented Nov 5, 2024

Korinin38 commented Nov 5, 2024

siliconvoodoo commented Nov 7, 2024 •

edited

Loading

Optimisation for high-load input #125

Are you sure you want to change the base?

Optimisation for high-load input #125

Conversation

Korinin38 commented Nov 9, 2023 • edited Loading

Motivation

Additions

Computing charts

Packing charts

Changes

lcc815 commented Jan 18, 2024

Korinin38 commented Jan 20, 2024

lcc815 commented Jan 22, 2024

lcc815 commented Jan 24, 2024

Korinin38 commented Jan 24, 2024 • edited Loading

lcc815 commented Jan 25, 2024 • edited Loading

Korinin38 commented Jan 25, 2024

lcc815 commented Jan 26, 2024

siliconvoodoo commented Nov 5, 2024

Korinin38 commented Nov 5, 2024

siliconvoodoo commented Nov 7, 2024 • edited Loading

Korinin38 commented Nov 9, 2023 •

edited

Loading

Korinin38 commented Jan 24, 2024 •

edited

Loading

lcc815 commented Jan 25, 2024 •

edited

Loading

siliconvoodoo commented Nov 7, 2024 •

edited

Loading