Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[cudadev] Moved accesses from value to const ref so that we get the b…
…enefit of __restrict__ Created an example to easily validate the effect __restrict__ The result can be checked compiling with the `-ptx` option instead of `-c`, and then grepped with:. ``` $ cat obj/cudadev/test/SoAStoreAndView_t.cu.ptx | c++filt | egrep '(.visible|(ld|st).global)' --color .visible .entry aAMDef(SoA1ViewTemplate<128ul, (cms::soa::AlignmentEnforcement)0, (cms::soa::CacheAccessStyle)0, (cms::soa::RestrictQualify)1>, unsigned long)( ld.global.f64 %fd1, [%rd21]; ld.global.f64 %fd2, [%rd20]; st.global.f64 [%rd22], %fd3; ld.global.f64 %fd4, [%rd21]; ld.global.f64 %fd5, [%rd20]; st.global.f64 [%rd23], %fd6; .visible .entry aAMRestrict(SoA1ViewTemplate<128ul, (cms::soa::AlignmentEnforcement)0, (cms::soa::CacheAccessStyle)0, (cms::soa::RestrictQualify)0>, unsigned long)( ld.global.nc.f64 %fd1, [%rd21]; ld.global.nc.f64 %fd2, [%rd20]; st.global.f64 [%rd22], %fd3; st.global.f64 [%rd23], %fd4; .visible .entry aAMNC(SoA1ViewTemplate<128ul, (cms::soa::AlignmentEnforcement)0, (cms::soa::CacheAccessStyle)1, (cms::soa::RestrictQualify)1>, unsigned long)( ld.global.f64 %fd1, [%rd21]; ld.global.f64 %fd2, [%rd20]; st.global.f64 [%rd22], %fd3; ld.global.f64 %fd4, [%rd21]; ld.global.f64 %fd5, [%rd20]; st.global.f64 [%rd23], %fd6; .visible .entry aAMRestrict(SoA1ViewTemplate<128ul, (cms::soa::AlignmentEnforcement)0, (cms::soa::CacheAccessStyle)1, (cms::soa::RestrictQualify)0>, unsigned long)( ld.global.nc.f64 %fd1, [%rd21]; ld.global.nc.f64 %fd2, [%rd20]; st.global.f64 [%rd22], %fd3; st.global.f64 [%rd23], %fd4; ``` The hint from restrict qualifier is used by the compiler to load values from global memory only once and via the non-coherent cache. The cache access styles are not implemented, and hence have no effect.
- Loading branch information