Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CUDA for older cards on Linux #712

Merged
merged 3 commits into from
Oct 29, 2021
Merged

Conversation

pierotofy
Copy link
Contributor

Fixes #708


@cdcseacave
Copy link
Owner

@pierotofy can you pls explain the problem you fixed?
And is there a better way to fix this? Adding flags by hand does not seem very general.

@cdcseacave cdcseacave self-requested a review October 29, 2021 14:35
@pierotofy
Copy link
Contributor Author

pierotofy commented Oct 29, 2021

--expt-relaxed-constexpr

By default, a constexpr function cannot be called from a function with incompatible execution space. When this flag is specified, host code can invoke a device constexpr function and device code can invoke a host constexpr function. nvcc will define the macro

It's really a fix for the warnings that come up when compiling some of the Eigen code (it's the same issue as https://eigen.tuxfamily.org/bz/show_bug.cgi?id=1323).

-arch specifies the CUDA compute architecture (what version of the CUDA API should the program be compiled for?) By default nvcc will compile for the latest version, but that means older cards cannot run the program.

I've modified the CMakeFile to pull the list of available architectures / codegens automatically and a user-defined OpenMVS_MAX_CUDA_COMPATIBILITY flag to enable/disable this behavior. Hopefully this is more general. 🙏

I've also made it generic (not just for Unix/Linux).

CMakeLists.txt Outdated
endif()

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be inside the OpenMVS_MAX_CUDA_COMPATIBILITY if?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nop, it applies to all configurations.

@cdcseacave
Copy link
Owner

the new code adds these flags on my machine:

-D_WINDOWS -Xcompiler="/W3 /GR /EHsc" -arch=sm_35 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_37,code=sm_37 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_53,code=sm_53 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_62,code=sm_62 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86

Would be better to only add the latest architecture and not all of them?

@pierotofy
Copy link
Contributor Author

pierotofy commented Oct 29, 2021

If you only add the latest architecture, the program will not run on older cards. That's the point of OpenMVS_MAX_CUDA_COMPATIBILITY (compile once, run on all possible cards).

If you don't set OpenMVS_MAX_CUDA_COMPATIBILITY (default) it will already compile for the latest (no flags needed).

@cdcseacave
Copy link
Owner

good point, so in this case we should modify the default mode to add only the latest architecture; so use the new code for both, and differ only at the end when the arch are added; what do you think?

@pierotofy
Copy link
Contributor Author

Sure, I'll make a commit with the change.

@cdcseacave
Copy link
Owner

if I use the default mode the arch is set to 52 for some reason, not sure why as from your code it seems 86 is supported
any idea who is why is setting arch to 52 and not something else?

@pierotofy
Copy link
Contributor Author

pierotofy commented Oct 29, 2021

The default depends on the CUDA toolkit version:

CUDA VERSION   Min CC   Deprecated CC  Default CC  Max CC
5.5 (and prior) 1.0       N/A             1.0
6.0             1.0       1.0             1.0
6.5             1.1       1.x             2.0
7.x             2.0       N/A             2.0
8.0             2.0       2.x             2.0      6.2
9.x             3.0       N/A             3.0      7.0
10.x            3.0       N/A             3.0      7.5 (3.0 deprecated in 10.2)
11.x            3.5       3.x,5.0         5.2      8.6 (11.0:8.0, 11.1:8.6)

In this case you're on 11.x, so the default is 52 (5.2).

@pierotofy
Copy link
Contributor Author

pierotofy commented Oct 29, 2021

Now that I think about it, are you sure you want to use always the latest arch and not the compiler's default? This might prevent the code from running on your card, unless it's one of the latest models.

@cdcseacave
Copy link
Owner

that is, I have 11.2
do you think it is best to use the max supported arch or the default one? I have no idea

@pierotofy
Copy link
Contributor Author

Probably the default one.

@cdcseacave
Copy link
Owner

ok, so let's let it as it is

@pierotofy
Copy link
Contributor Author

Thanks for reviewing the changes!

@cdcseacave
Copy link
Owner

I thank you for figuring out all this, it is very complex!!!

@cdcseacave cdcseacave merged commit 67798ba into cdcseacave:develop Oct 29, 2021
cdcseacave added a commit that referenced this pull request Dec 11, 2021
* dense: add image mask support

* dense: fix bug in mask application

* build: update find packages

* build: fix usage as third-party library

* interface: export and compare camera poses with an OpenMVG refined version

* interface: export MVS scene to NVM format

* common: fix camera scaling

* dense: small code refactor

* mvs: view DMAPs with different resolution

* interface: replace define with constexp

* interface: small change

* common: fix PFM exporter/importer

* common: disable CPUID on ARM platforms

* dense: add Region-Of-Interest (ROI) support

* common: improve octree speed

* build: update AppVeyor

* mesh: split mesh in sub-meshed by max area

* dense: split scene in sub-scenes

* cmake: build libCommon/IO/Math as shared if BUILD_SHARED_LIBS=ON (#650)

* mesh: fix bug in RemoveFaces()

* dense: filter more redundant views from sub-scenes

* common: add ZSTD serialization support

* apps: set internal linkage for functions and variables (#656)

* common: small refactor

* mesh: raster triangles using barycentric coordinates

* remove cotire

* dense: merge small sub-scenes

* interface: import COLMAP depth-maps

* interface: add MVS preset to the python script

* interface: fix COLMAP on linux

* common: make binary projects portable

* viewer: display views seeing selected point or points seen by the current view

* dense: improve depth-map initialization

* dense: merge depth-maps (no fusion)

* interface: add similarity transform functionality

* common: fix file permissions on linux

* mesh: add GLTF writing support

* mesh: add target-face-num parameter to ReconstructMesh (#671)

* mesh: improve GLTF support

* dense: add geometric-consistency

* dense: fix for invalid images

(cherry picked from commit b29648f835e5ee0b36b4240a160a20124001f34d)

* dense: close file in reading invalid depth-map (#685)

* dense: export point-cloud with min number of views per point

(cherry picked from commit 9b6b2fd7ec89a5b7ea3acdef83664daccbfc8c03)

* dense: detect computed depth-maps

(cherry picked from commit 52f9162874e4c3da66bb618dcea9dddc1bdf644a)

* refine: initialize image neighbors from input mesh

(cherry picked from commit 2a069fa0856582da2b766ba85091d3e46fa783de)

* dense: select better angle neighbors

(cherry picked from commit 05a49551ad13873c664bf394d4584cf5c44f198c)

* dense: patch-match implemented in CUDA (faster & better)

* common: fix latest boost

* dense: expose --cuda-device option (#707)

* build: add Dockerfile for CUDA #710

* dense: fix bug in selecting views

(cherry picked from commit d068c02295234199f49b26afc85ac643886f2311)

* build: fix CUDA for older cards (#712)

* dense: fix small clusters (#702)

* interface: fix COLMAP log export

* interface: add Metashape and predefined neighbors list support

* interface: fix compile linux

* dense: add support for scenes with empty point-cloud but known (coarse) mesh

* dense: filter low score estimates

* interface: add Metashape ROI support

* viewer: display ROI

* dense: increase NCC threshold for CUDA

* interface: remap Metashape indices

* dense: fix depth-map crash in CUDA

* interface: update MVG-MVS pipeline

Co-authored-by: YOSHIFUJI Naoki <lwisteria.ao@gmail.com>
Co-authored-by: Piero Toffanin <pt@masseranolabs.com>
Co-authored-by: Tommy Bojanin <Bojanint@gmail.com>
cdcseacave added a commit that referenced this pull request Dec 11, 2021
* dense: add image mask support

* dense: fix bug in mask application

* build: update find packages

* build: fix usage as third-party library

* interface: export and compare camera poses with an OpenMVG refined version

* interface: export MVS scene to NVM format

* common: fix camera scaling

* dense: small code refactor

* mvs: view DMAPs with different resolution

* interface: replace define with constexp

* interface: small change

* common: fix PFM exporter/importer

* common: disable CPUID on ARM platforms

* dense: add Region-Of-Interest (ROI) support

* common: improve octree speed

* build: update AppVeyor

* mesh: split mesh in sub-meshed by max area

* dense: split scene in sub-scenes

* cmake: build libCommon/IO/Math as shared if BUILD_SHARED_LIBS=ON (#650)

* mesh: fix bug in RemoveFaces()

* dense: filter more redundant views from sub-scenes

* common: add ZSTD serialization support

* apps: set internal linkage for functions and variables (#656)

* common: small refactor

* mesh: raster triangles using barycentric coordinates

* remove cotire

* dense: merge small sub-scenes

* interface: import COLMAP depth-maps

* interface: add MVS preset to the python script

* interface: fix COLMAP on linux

* common: make binary projects portable

* viewer: display views seeing selected point or points seen by the current view

* dense: improve depth-map initialization

* dense: merge depth-maps (no fusion)

* interface: add similarity transform functionality

* common: fix file permissions on linux

* mesh: add GLTF writing support

* mesh: add target-face-num parameter to ReconstructMesh (#671)

* mesh: improve GLTF support

* dense: add geometric-consistency

* dense: fix for invalid images

(cherry picked from commit b29648f835e5ee0b36b4240a160a20124001f34d)

* dense: close file in reading invalid depth-map (#685)

* dense: export point-cloud with min number of views per point

(cherry picked from commit 9b6b2fd7ec89a5b7ea3acdef83664daccbfc8c03)

* dense: detect computed depth-maps

(cherry picked from commit 52f9162874e4c3da66bb618dcea9dddc1bdf644a)

* refine: initialize image neighbors from input mesh

(cherry picked from commit 2a069fa0856582da2b766ba85091d3e46fa783de)

* dense: select better angle neighbors

(cherry picked from commit 05a49551ad13873c664bf394d4584cf5c44f198c)

* dense: patch-match implemented in CUDA (faster & better)

* common: fix latest boost

* dense: expose --cuda-device option (#707)

* build: add Dockerfile for CUDA #710

* dense: fix bug in selecting views

(cherry picked from commit d068c02295234199f49b26afc85ac643886f2311)

* build: fix CUDA for older cards (#712)

* dense: fix small clusters (#702)

* interface: fix COLMAP log export

* interface: add Metashape and predefined neighbors list support

* interface: fix compile linux

* dense: add support for scenes with empty point-cloud but known (coarse) mesh

* dense: filter low score estimates

* interface: add Metashape ROI support

* viewer: display ROI

* dense: increase NCC threshold for CUDA

* interface: remap Metashape indices

* dense: fix depth-map crash in CUDA

* interface: update MVG-MVS pipeline

* increase version to 2.0.0

Co-authored-by: YOSHIFUJI Naoki <lwisteria.ao@gmail.com>
Co-authored-by: Piero Toffanin <pt@masseranolabs.com>
Co-authored-by: Tommy Bojanin <Bojanint@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants