Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimisation for high-load input #125

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open

Conversation

Korinin38
Copy link

@Korinin38 Korinin38 commented Nov 9, 2023

This fork is still a bit rough, but is actively developed as of now. There are some important changes, and more strict STL dependencies, so I suggest leave this PR unmerged and instead be used as a link for those who are interested.

Motivation

Big scale projects (photo-realistic objects with >1M faces) can be processed with xatlas, with a caveat of either significant time cost (if using BruteForce packing method) or sub-optimal result (if using Random packing method). The purpose of this fork is to speed up the time of computation of BruteForce while retaining the packing efficiency, or combine two methods to get optimal result in a satisfying time.

To easily compare productivity of old and new versions, see https://github.com/Korinin38/xatlas-comparison.

Additions

Computing charts

No changes.

Packing charts

  • Speedup of packing with BruteForce by using parallel computations and Coarse-to-Fine scheme
  • Option to rotate charts by 90/180/270 degrees
  • GPU implementation of packing (WIP)
  • Change of Random using BruteForce method by other criteria (which slows it down, but considerably improves quality)

Changes

  • Temporary fix of Packing UV meshes results are invalid #116: 'zero area' (actually 'area < ε') charts are not ignored and processed as usual.
    Reason: If texelsPerUnit or area captured are large enough, it may lead to undesired result.
  • PackOptions::rotateCharts is renamed to PackOptions::transposeCharts (which is more accurate description of its purpose); PackOptions::rotateCharts is now used for actual rotation.

Korinin38 and others added 30 commits August 8, 2023 11:20
BREAKING CHANGE: does not ignore close-to-zero area faces. It was making holes in dense UVMeshes. To preserve functionality, a workaround is needed.
With constant number of steps in scheme and rate.
Also: metric experiments, modifying parallelization for more consistent results.

BREAKING CHANGE: PackOptions now has additional options, may break C code.
Better results (increased quality)
Also: Add different callback which outputs percentage more frequently and more accurate.

PackOptions are renamed.
rough implementation of two other overlap checking strategies: image compression and 2d segment tree
Scheme now uses different reductions of chart for different offsets, making it "true" rasterisation.

Changed atlas rasterisation to optimistic for faster results
Instead of reducing full image, we now use image from previous layer.
separate checks for pessimistic and optimistic options
when rate is > 4, reduceToOptimistic should work faster (not tested; hidden inside feature flag)
Not only works faster, but also saves memory.
preparation for adding option to actually "rotate" charts
Update comments for PackOptions

BREAKING CHANGE: PackOptions::rotateCharts is now responsible for an actual chart rotation
New option to force preserving fractional part of input texture coordinates
GPU implementation (currently not supported fully)
skipSpeedup, usePreviousPositionOffset, gridSpeedup - new options.

BREAKING CHANGE: new pack options; coarseLevelRate is now hidden from user (its function is now replaced by XA_PACKING_COARSE_RATE[...] macros family.
# Conflicts:
#	source/xatlas/xatlas.cpp
#	source/xatlas/xatlas.h
@lcc815
Copy link

lcc815 commented Jan 18, 2024

Hi, @Korinin38

Thanks for your great work. But when I tried your repo, I cannot build successfully and get following error:

xatlas_static.make:182: recipe for target 'obj/x86_64/Release/xatlas_static/xatlas.o' failed
make[1]: *** [obj/x86_64/Release/xatlas_static/xatlas.o] Error 1
Makefile:120: recipe for target 'xatlas_static' failed
make: *** [xatlas_static] Error 2

Did I do any thing wrong? Any help is appreciated.

@Korinin38
Copy link
Author

Hi, @lcc815

Have you tried building current version of xatlas without these contributions? If yes, is this error unique to xatlas_hi-res_optimised?

@lcc815
Copy link

lcc815 commented Jan 22, 2024

Hi, @lcc815

Have you tried building current version of xatlas without these contributions? If yes, is this error unique to xatlas_hi-res_optimised?

Yes! I can build current version of xatlas without these contributions successfully.

@lcc815
Copy link

lcc815 commented Jan 24, 2024

@Korinin38 hello, could you please try solving this? please.. Current version of xatlas is just too slow to use.

@Korinin38
Copy link
Author

Korinin38 commented Jan 24, 2024

Hi, @lcc815, sorry for keeping you waiting,

I examined the problem. Seems like the error appears due to addition of exception handling, it would be fixed soon. For immediate result, try changing the parameter exceptionhandling from "Off" to "On" in both "xatlas" and "xatlas_static" projects in premake5.lua, or use the patch that does just that.

Please contact me whether that helped or not.

also:
address warnings
add openmp options in premake file
remove exception handling
fix test of gazebo.obj
@lcc815
Copy link

lcc815 commented Jan 25, 2024

@Korinin38 yes! It works! I can build and run your code successfully just like the official code.
However, the time cost is also as the same as the time cost of running the official code : )
What I do is running these:
./build/gmake/bin/x86_64/Release/example my_mesh.obj
I believe I did some thing wrong, could you please tell me how to use your code to speed up this wrapping process.
Again, thanks for your reply!

@Korinin38
Copy link
Author

I'll see into it. Can you provide logs for both launches? It would help me greatly.

Bear in mind that there are two independent steps: computing charts and packing charts, and if the slowest step is the former, I can do nothing about it as it is not changed in any way.

@lcc815
Copy link

lcc815 commented Jan 26, 2024

Aha, I got it. The bottleneck is computing charts. packing charts do cost less time than the official code.
Thanks a lot!

@siliconvoodoo
Copy link

I pulled your branch and did 3 tests with a city block model.

original xatlas: 73 seconds
korinin XA_USE_GPU off CUDA_SUPPORT off openMP unset: 169 seconds
korinin XA_USE_GPU off CUDA_SUPPORT off openMP on: didn't finish after 30 minutes. force stopped.

I didn't try CUDA because of build setup complexities.

@Korinin38
Copy link
Author

Hello @siliconvoodoo, thank you for reaching out,

Is it possible for you to provide the example model? If not, could you share details such as number of vertices & faces, as well as number of charts parametrized with xatlas? It's also useful to know which PackOptions you used, and whether the results of packing are satisfactory in both finished tests.

@siliconvoodoo
Copy link

siliconvoodoo commented Nov 7, 2024

It's a model with 52 meshes. 4,274,763 vertices, 7.5M triangles.
With original xatlas the viewer gives these stats
image

But today I tried to build with OpenCL.
That was very hard because your branch doesn't include the dependencies. So I found them on your other repository xatlas-comparison. Went and installed Cuda tookit to have the libopencl.

Went to cmake your 3 libraries (clew/gpu/misc)
Manually added includes and link dependencies (just a note this PR won't get accepted because it breaks the standalone principle of xatlas (only 1 .h and .cpp))

you miss some template exports:
template OpenCLKernelArg::OpenCLKernelArg(const gpu::shared_device_buffer_typed<unsigned __int64>& arg);

the first blit.cl I tried had old content and didn't build.
activating printLog = true, showed:

Device 1
Program build log:
:16:7: error: use of undeclared identifier 'uint32_t'
for (uint32_t y = 0; y < ch; y++) {
^
...

fixed that for nothing because then, there was no aggregateResults entry point.

found the more recent blit.cl,
then it was this error:

:304:32: error: call to 'max' is ambiguous
const unsigned int extentX = max(w, offset_x + chartSizes[0]);
^~~
cl_kernel.h:3498:22: note: candidate function
int OVERLOADABLE max(int, int);
^
cl_kernel.h:3499:23: note: candidate function
uint OVERLOADABLE max(uint, uint);

I fixed with casts.

then the execution failed because the call site missed arguments:
kernels["blitLevel"].exec missing w, h, so I added them and retried

But then execution failed at a further point, with this log:

{Ptr=0x0000000024f730a0 "Kernel aggregateResults: CL_UNKNOWN_ERROR_CODE-9999 (-9999) at line 368" }

Different exception from in viewer project:

{_Ptr=0x000001ec1a3f1bc0 "Global work_size[0] value is zero!" }

I only got it to work on a cornel box model or a model with two cubes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants