Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API(4) Move HelAmps to the latest MemoryAccess classes #322

Merged
merged 100 commits into from
Jan 10, 2022

Conversation

valassi
Copy link
Member

@valassi valassi commented Dec 24, 2021

I have decided to strip off what was initially "the second part of apirambo" PR #321.

Essentially in apirambo I have the new MemoryAccess for rambo, the old MemoryAcccess for MEs. This new PR is about porting the new MemoryAccess to MEs and specifically HELAmps, too.

This is WIP

  • I have only prototyped one function imzxxx, all other functions must be done
  • The SIMD/512y performance in eemumu of the new memory access gives a 10% performance hit, to be understood
  • The functional and performance tests however succeed, all segfaults and failures are fixed

What remans to be done

  • fix the performance in eemumu
  • port all other ixxx/oxxx functions
  • rerun the throuput test in eemumu and check all is ok
  • remove neppM from mgOnGpuConfig, now it will be enough to encapsulate it in MemoryAccessMomenta
  • strip off p4type from MemryAccess.h, rename it as MemoryAccessVectors.h (just the fptype/fptye_v conversions)
  • cleanup, remove all unused code (eg Memory.h, possibly more)
  • backport to code generate
  • rerun tests for ggtt and ggttgg, check performance

…lAmps - none check/gcheck builds

(Still to do: testxxx port and SIMD vector types)
ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/bin/g++  -O3  -std=c++17 -I. -I../../src -I../../../../../tools -I../../../../../tools -I../../../../../test/googletest/googletest/include -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math    -DMGONGPU_FPTYPE_DOUBLE -I/usr/local/cuda-11.1/include/ -c testxxx.cc -o testxxx.o
In file included from testxxx.cc:5:
../../src/HelAmps_sm.h:89:8: warning: inline function ‘void MG5_sm::imzxxx(const fptype_sv*, int, int, cxtype_sv*, int) [with M_ACCESS = KernelAccessMomenta<false>; fptype_sv = double; cxtype_sv = std::complex<double>]’ used but never defined
   89 |   void imzxxx( const fptype_sv* momenta,

/cvmfs/sft.cern.ch/lcg/releases/binutils/2.34-990b2/x86_64-centos7/bin/ld: ./testxxx.o: in function `SIGMA_SM_EPEM_MUPMUM_CPU_testxxx_Test::TestBody()':
testxxx.cc:(.text+0xdd76): undefined reference to `void MG5_sm::imzxxx<KernelAccessMomenta<false> >(double const*, int, int, std::complex<double>*, int)'
…t test fails at runtime

testxxx.cc:187: Failure
The difference between cxreal( wf[iw6] ) and expReal is 1000, which exceeds std::abs( expReal * toleranceXXXs ), where
cxreal( wf[iw6] ) evaluates to -500,
expReal evaluates to 500, and
std::abs( expReal * toleranceXXXs ) evaluates to 4.9999999999999999e-13.
 itest=12: imzxxx#1 against ixxxxx
(NB runTest still segfaults if the checks are skipped)
… ok, cuda perf ok, SIMD 10% slower...

(NB cuda performance is definitely not affected, one test even was 2% faster than the previous reference)
@valassi valassi self-assigned this Dec 24, 2021
@valassi valassi marked this pull request as draft December 24, 2021 09:44
This was referenced Dec 24, 2021
…2022 - eemumu performance fluctuations

(Note in particular a 2% fluctuation in the cuda results)
…cover apirambo performance

(Note there are still reproducible 2% fluctuations in both cuda and c++, but today I get them in apirambo too)
…events have same initial momenta

(Hence the MEs are all 0 and the tests fail)
valassi added 21 commits January 6, 2022 09:28
Revert "[apihel] experiment with noinline keyword - gives build warnings, performance slightly worse than expected"
This reverts commit 63ca571.

Revert "[apihel] attempt another fix, define INLINE as inline in all cases - builds, but affects performance"
This reverts commit 4f9089e.

Revert "[apihel] first fix to move XXX function implementation to cc file - testxxx build still fails"
This reverts commit f594600.

Revert "[apihel] try again to move XXX function implementation to cc file - testxxx build fails"
This reverts commit a570168.
…re identical

(NB moved XXX function implementation earlier on in HelAmps.h manual too)
@valassi valassi changed the title WIP API(4) Move HelAmps to the latest MemorAccess classes API(4) Move HelAmps to the latest MemoryAccess classes Jan 6, 2022
@valassi
Copy link
Member Author

valassi commented Jan 6, 2022

This is now complete.

This is where I stripped off what was initially "the second part of apirambo" PR #321. Essentially in apirambo I have the new MemoryAccessMomenta for rambo, the old MemoryAcccessMomenta for MEs in HelAmps. This new PR is about porting the new MemoryAccessMomenta to HelAmps, too. Note that new BufferMEs and MemoryAccessMEs will be created in the upcoming apimes PR.

With respect to the previous WIP

  • I had only prototyped one function imzxxx, now also all other functions have been done. Note that the choice of imz was bad: there was a functional bug and I was reading always the same first event, but as the initial momenta are the same for all events, this bug had gone unnoticed (the avereage ME was ok).
  • The SIMD/512y performance in eemumu of the new memory access was giving a 10% performance hit in eemumu. After fixing the functional bug, this disappeared and 'magically' I even got 5-10% better performance in eemumu. I have just run the ggtt and ggttgg tests however and the ggtt performance seems worse (to be understood?), especially with hardcoded parameters(?!), and only in avx2 or avx512, while none/sse4 are faster(?!). For ggttgg however all is ok.

With respect to the other points that I mentioned to be done

  • fix the performance in eemumu: done, with caveats as above for ggtt
  • port all other ixxx/oxxx functions: done
  • rerun the throuput test in eemumu and check all is ok: done
  • remove neppM from mgOnGpuConfig, now it will be enough to encapsulate it in MemoryAccessMomenta: done
  • strip off p4type from MemryAccess.h, rename it as MemoryAccessVectors.h (just the fptype/fptye_v conversions): done
  • cleanup, remove all unused code (eg Memory.h, possibly more): done, but note that Memory.h will need the apimes PR to be removed
  • backport to code generate: done
  • rerun tests for ggtt and ggttgg, check performance: done

In summary, pending:

  • apimes PR (upcoming "API5"): new Buffer and MemoryAccess for MEs, also get rid of Memory.h
  • understand better ggtt performance from these changes

@valassi valassi marked this pull request as ready for review January 6, 2022 10:01
@valassi
Copy link
Member Author

valassi commented Jan 6, 2022

This completes the "API4" step described in #323

@valassi
Copy link
Member Author

valassi commented Jan 10, 2022

This was presented at the meeting today
https://indico.cern.ch/event/1103955/
I am merging this now

@valassi valassi merged commit 2c36ef9 into madgraph5:master Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant