Fix CUDA for older cards on Linux #712

pierotofy · 2021-10-28T14:56:49Z

Fixes #708

cdcseacave · 2021-10-29T14:34:58Z

@pierotofy can you pls explain the problem you fixed?
And is there a better way to fix this? Adding flags by hand does not seem very general.

pierotofy · 2021-10-29T16:55:13Z

--expt-relaxed-constexpr

By default, a constexpr function cannot be called from a function with incompatible execution space. When this flag is specified, host code can invoke a device constexpr function and device code can invoke a host constexpr function. nvcc will define the macro

It's really a fix for the warnings that come up when compiling some of the Eigen code (it's the same issue as https://eigen.tuxfamily.org/bz/show_bug.cgi?id=1323).

-arch specifies the CUDA compute architecture (what version of the CUDA API should the program be compiled for?) By default nvcc will compile for the latest version, but that means older cards cannot run the program.

I've modified the CMakeFile to pull the list of available architectures / codegens automatically and a user-defined OpenMVS_MAX_CUDA_COMPATIBILITY flag to enable/disable this behavior. Hopefully this is more general. 🙏

I've also made it generic (not just for Unix/Linux).

cdcseacave · 2021-10-29T18:46:52Z

CMakeLists.txt

 		endif()
+
+		set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr")


this shouldn't be inside the OpenMVS_MAX_CUDA_COMPATIBILITY if?

Nop, it applies to all configurations.

cdcseacave · 2021-10-29T19:03:05Z

the new code adds these flags on my machine:

-D_WINDOWS -Xcompiler="/W3 /GR /EHsc" -arch=sm_35 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_37,code=sm_37 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_53,code=sm_53 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_62,code=sm_62 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86

Would be better to only add the latest architecture and not all of them?

pierotofy · 2021-10-29T19:04:50Z

If you only add the latest architecture, the program will not run on older cards. That's the point of OpenMVS_MAX_CUDA_COMPATIBILITY (compile once, run on all possible cards).

If you don't set OpenMVS_MAX_CUDA_COMPATIBILITY (default) it will already compile for the latest (no flags needed).

cdcseacave · 2021-10-29T19:06:48Z

good point, so in this case we should modify the default mode to add only the latest architecture; so use the new code for both, and differ only at the end when the arch are added; what do you think?

pierotofy · 2021-10-29T19:11:13Z

Sure, I'll make a commit with the change.

cdcseacave · 2021-10-29T19:16:36Z

if I use the default mode the arch is set to 52 for some reason, not sure why as from your code it seems 86 is supported
any idea who is why is setting arch to 52 and not something else?

pierotofy · 2021-10-29T19:19:52Z

The default depends on the CUDA toolkit version:

CUDA VERSION   Min CC   Deprecated CC  Default CC  Max CC
5.5 (and prior) 1.0       N/A             1.0
6.0             1.0       1.0             1.0
6.5             1.1       1.x             2.0
7.x             2.0       N/A             2.0
8.0             2.0       2.x             2.0      6.2
9.x             3.0       N/A             3.0      7.0
10.x            3.0       N/A             3.0      7.5 (3.0 deprecated in 10.2)
11.x            3.5       3.x,5.0         5.2      8.6 (11.0:8.0, 11.1:8.6)

In this case you're on 11.x, so the default is 52 (5.2).

pierotofy · 2021-10-29T19:21:48Z

Now that I think about it, are you sure you want to use always the latest arch and not the compiler's default? This might prevent the code from running on your card, unless it's one of the latest models.

cdcseacave · 2021-10-29T19:22:18Z

that is, I have 11.2
do you think it is best to use the max supported arch or the default one? I have no idea

pierotofy · 2021-10-29T19:22:36Z

Probably the default one.

cdcseacave · 2021-10-29T19:23:20Z

ok, so let's let it as it is

pierotofy · 2021-10-29T19:23:36Z

Thanks for reviewing the changes!

cdcseacave · 2021-10-29T19:24:25Z

I thank you for figuring out all this, it is very complex!!!

* dense: add image mask support * dense: fix bug in mask application * build: update find packages * build: fix usage as third-party library * interface: export and compare camera poses with an OpenMVG refined version * interface: export MVS scene to NVM format * common: fix camera scaling * dense: small code refactor * mvs: view DMAPs with different resolution * interface: replace define with constexp * interface: small change * common: fix PFM exporter/importer * common: disable CPUID on ARM platforms * dense: add Region-Of-Interest (ROI) support * common: improve octree speed * build: update AppVeyor * mesh: split mesh in sub-meshed by max area * dense: split scene in sub-scenes * cmake: build libCommon/IO/Math as shared if BUILD_SHARED_LIBS=ON (#650) * mesh: fix bug in RemoveFaces() * dense: filter more redundant views from sub-scenes * common: add ZSTD serialization support * apps: set internal linkage for functions and variables (#656) * common: small refactor * mesh: raster triangles using barycentric coordinates * remove cotire * dense: merge small sub-scenes * interface: import COLMAP depth-maps * interface: add MVS preset to the python script * interface: fix COLMAP on linux * common: make binary projects portable * viewer: display views seeing selected point or points seen by the current view * dense: improve depth-map initialization * dense: merge depth-maps (no fusion) * interface: add similarity transform functionality * common: fix file permissions on linux * mesh: add GLTF writing support * mesh: add target-face-num parameter to ReconstructMesh (#671) * mesh: improve GLTF support * dense: add geometric-consistency * dense: fix for invalid images (cherry picked from commit b29648f835e5ee0b36b4240a160a20124001f34d) * dense: close file in reading invalid depth-map (#685) * dense: export point-cloud with min number of views per point (cherry picked from commit 9b6b2fd7ec89a5b7ea3acdef83664daccbfc8c03) * dense: detect computed depth-maps (cherry picked from commit 52f9162874e4c3da66bb618dcea9dddc1bdf644a) * refine: initialize image neighbors from input mesh (cherry picked from commit 2a069fa0856582da2b766ba85091d3e46fa783de) * dense: select better angle neighbors (cherry picked from commit 05a49551ad13873c664bf394d4584cf5c44f198c) * dense: patch-match implemented in CUDA (faster & better) * common: fix latest boost * dense: expose --cuda-device option (#707) * build: add Dockerfile for CUDA #710 * dense: fix bug in selecting views (cherry picked from commit d068c02295234199f49b26afc85ac643886f2311) * build: fix CUDA for older cards (#712) * dense: fix small clusters (#702) * interface: fix COLMAP log export * interface: add Metashape and predefined neighbors list support * interface: fix compile linux * dense: add support for scenes with empty point-cloud but known (coarse) mesh * dense: filter low score estimates * interface: add Metashape ROI support * viewer: display ROI * dense: increase NCC threshold for CUDA * interface: remap Metashape indices * dense: fix depth-map crash in CUDA * interface: update MVG-MVS pipeline Co-authored-by: YOSHIFUJI Naoki <lwisteria.ao@gmail.com> Co-authored-by: Piero Toffanin <pt@masseranolabs.com> Co-authored-by: Tommy Bojanin <Bojanint@gmail.com>

* dense: add image mask support * dense: fix bug in mask application * build: update find packages * build: fix usage as third-party library * interface: export and compare camera poses with an OpenMVG refined version * interface: export MVS scene to NVM format * common: fix camera scaling * dense: small code refactor * mvs: view DMAPs with different resolution * interface: replace define with constexp * interface: small change * common: fix PFM exporter/importer * common: disable CPUID on ARM platforms * dense: add Region-Of-Interest (ROI) support * common: improve octree speed * build: update AppVeyor * mesh: split mesh in sub-meshed by max area * dense: split scene in sub-scenes * cmake: build libCommon/IO/Math as shared if BUILD_SHARED_LIBS=ON (#650) * mesh: fix bug in RemoveFaces() * dense: filter more redundant views from sub-scenes * common: add ZSTD serialization support * apps: set internal linkage for functions and variables (#656) * common: small refactor * mesh: raster triangles using barycentric coordinates * remove cotire * dense: merge small sub-scenes * interface: import COLMAP depth-maps * interface: add MVS preset to the python script * interface: fix COLMAP on linux * common: make binary projects portable * viewer: display views seeing selected point or points seen by the current view * dense: improve depth-map initialization * dense: merge depth-maps (no fusion) * interface: add similarity transform functionality * common: fix file permissions on linux * mesh: add GLTF writing support * mesh: add target-face-num parameter to ReconstructMesh (#671) * mesh: improve GLTF support * dense: add geometric-consistency * dense: fix for invalid images (cherry picked from commit b29648f835e5ee0b36b4240a160a20124001f34d) * dense: close file in reading invalid depth-map (#685) * dense: export point-cloud with min number of views per point (cherry picked from commit 9b6b2fd7ec89a5b7ea3acdef83664daccbfc8c03) * dense: detect computed depth-maps (cherry picked from commit 52f9162874e4c3da66bb618dcea9dddc1bdf644a) * refine: initialize image neighbors from input mesh (cherry picked from commit 2a069fa0856582da2b766ba85091d3e46fa783de) * dense: select better angle neighbors (cherry picked from commit 05a49551ad13873c664bf394d4584cf5c44f198c) * dense: patch-match implemented in CUDA (faster & better) * common: fix latest boost * dense: expose --cuda-device option (#707) * build: add Dockerfile for CUDA #710 * dense: fix bug in selecting views (cherry picked from commit d068c02295234199f49b26afc85ac643886f2311) * build: fix CUDA for older cards (#712) * dense: fix small clusters (#702) * interface: fix COLMAP log export * interface: add Metashape and predefined neighbors list support * interface: fix compile linux * dense: add support for scenes with empty point-cloud but known (coarse) mesh * dense: filter low score estimates * interface: add Metashape ROI support * viewer: display ROI * dense: increase NCC threshold for CUDA * interface: remap Metashape indices * dense: fix depth-map crash in CUDA * interface: update MVG-MVS pipeline * increase version to 2.0.0 Co-authored-by: YOSHIFUJI Naoki <lwisteria.ao@gmail.com> Co-authored-by: Piero Toffanin <pt@masseranolabs.com> Co-authored-by: Tommy Bojanin <Bojanint@gmail.com>

Target sm_35, add check after RunCUDA

242de27

cdcseacave self-requested a review October 29, 2021 14:35

Add OpenMVS_MAX_CUDA_COMPATIBILITY

7f5274b

cdcseacave reviewed Oct 29, 2021

View reviewed changes

add arch message

0a71103

cdcseacave merged commit 67798ba into cdcseacave:develop Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA for older cards on Linux #712

Fix CUDA for older cards on Linux #712

pierotofy commented Oct 28, 2021

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021 •

edited

Loading

cdcseacave Oct 29, 2021

pierotofy Oct 29, 2021

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021 •

edited

Loading

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021 •

edited

Loading

pierotofy commented Oct 29, 2021 •

edited

Loading

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021

cdcseacave commented Oct 29, 2021

		endif()

		set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr")

Fix CUDA for older cards on Linux #712

Fix CUDA for older cards on Linux #712

Conversation

pierotofy commented Oct 28, 2021

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021 • edited Loading

cdcseacave Oct 29, 2021

Choose a reason for hiding this comment

pierotofy Oct 29, 2021

Choose a reason for hiding this comment

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021 • edited Loading

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021 • edited Loading

pierotofy commented Oct 29, 2021 • edited Loading

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021

cdcseacave commented Oct 29, 2021

pierotofy commented Oct 29, 2021 •

edited

Loading

pierotofy commented Oct 29, 2021 •

edited

Loading

pierotofy commented Oct 29, 2021 •

edited

Loading

pierotofy commented Oct 29, 2021 •

edited

Loading