Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore not storing/copying the sorted hits and hit prefetching. #208

Merged
merged 1 commit into from
May 24, 2019

Conversation

osschar
Copy link
Collaborator

@osschar osschar commented Apr 5, 2019

Using non-sorted hits from external hit-vector and keeping hit ranks for access.

*** Summary:

  • Not sorting hits does not hurt performance, very little change.
    TO DECIDE: Do we keep both options with ifdefs?

  • Test performance without doing the explicit mm_prefetch.
    ifdefs were there (MkFinder) for best-hit, I added them for clone engine
    and for standard. Did not do it for FV yet.

    It seems there is no benefit from prefetchin at all, even when
    hits are not copied into sorted order!
    In fact, about 3% faster.
    [It made me think hits are alrady sorted ... well, they seem to be within
    a module ... but the direction is not necessarily the same.]

    TO DECIDE: Do we remove prefetching instructions or we just ifdef them out by default.

  • Fix in quality_val where we search for the seed track which was wrong due
    to seed cleaning. Kevin, please review.

    Note that with ranks (and reverse ranks) we could do hit index remapping
    without building of translation maps.
    Kevin and I (tohether) can probably do it rather fast.

*** Funny crash:

SEGV in mm_prefetch when preloading a hit.

Seems to only happen with O3, nun-thr >= 4, prefetching on (obviously).

I'm tracing it down as it really shouldn't happen. Seems more like an icc bug.

*** "Physics performance" test

Compare quality-val output on first 5 events.

  1. Expect no change.
  2. Observe some improvement with new code for some events ... probably due to fix in
    getting the seed track.

*** Timing tests: clone engine, single thread, pu70-ccc, 500 events

time ./mkFit --cmssw-n2seeds --input-file ../../mictest/mkFit/pu70-ccc-hs.bin --build-ce --num-events 500

  • "Manual" prefetching that we have hurts a little.
    Surprisingly, even when not storing sorted hits.

= devel

Total Matriplex fit = 0.00000 --- Build BHMX = 0.00000 STDMX = 0.00000 CEMX = 69.67924 FVMX = 0.00000
Total event loop time 79.82160 simtracks 4943065 seedtracks 1040426 builtcands 3421025 maxhits 6117 on lay 5
real 1m19.853s user 1m18.097s sys 0m1.552s

= hit-sort - no hit copy

Total Matriplex fit = 0.00000 --- Build BHMX = 0.00000 STDMX = 0.00000 CEMX = 69.85903 FVMX = 0.00000
Total event loop time 79.78304 simtracks 4943065 seedtracks 1040426 builtcands 3421025 maxhits 5998 on lay 5
real 1m19.819s user 1m18.050s sys 0m1.563s

= hit-sort - no hit copy - AVX_512

Total Matriplex fit = 0.00000 --- Build BHMX = 0.00000 STDMX = 0.00000 CEMX = 46.86359 FVMX = 0.00000
Total event loop time 57.06883 simtracks 4943065 seedtracks 1040426 builtcands 3421071 maxhits 5998 on lay 5
real 0m57.099s user 0m55.389s sys 0m1.560s

= hit-sort - no hit copy - no prefetch

Total Matriplex fit = 0.00000 --- Build BHMX = 0.00000 STDMX = 0.00000 CEMX = 67.40083 FVMX = 0.00000
Total event loop time 77.28962 simtracks 4943065 seedtracks 1040426 builtcands 3421025 maxhits 5998 on lay 5
real 1m17.319s user 1m15.600s sys 0m1.521s

= hit-sort - no hit copy - no prefetch - AVX_512

Total Matriplex fit = 0.00000 --- Build BHMX = 0.00000 STDMX = 0.00000 CEMX = 44.65550 FVMX = 0.00000
Total event loop time 54.89823 simtracks 4943065 seedtracks 1040426 builtcands 3421071 maxhits 5998 on lay 5
real 0m54.925s user 0m53.196s sys 0m1.584s

= hit-sort - no hit copy - no prefetch - AVX2

Total Matriplex fit = 0.00000 --- Build BHMX = 0.00000 STDMX = 0.00000 CEMX = 53.88735 FVMX = 0.00000
Total event loop time 63.83459 simtracks 4943065 seedtracks 1040426 builtcands 3421022 maxhits 5998 on lay 5
real 1m3.861s user 1m2.114s sys 0m1.578s

= hit-sort - yes hit copy

Total Matriplex fit = 0.00000 --- Build BHMX = 0.00000 STDMX = 0.00000 CEMX = 69.97530 FVMX = 0.00000
Total event loop time 80.25051 simtracks 4943065 seedtracks 1040426 builtcands 3421025 maxhits 5998 on lay 5
real 1m20.279s user 1m18.494s sys 0m1.579s

= hit-sort - yes hit copy - no prefetch

Total Matriplex fit = 0.00000 --- Build BHMX = 0.00000 STDMX = 0.00000 CEMX = 66.97209 FVMX = 0.00000
Total event loop time 77.29263 simtracks 4943065 seedtracks 1040426 builtcands 3421025 maxhits 5998 on lay 5
real 1m17.322s user 1m15.508s sys 0m1.615s

*** Timing tests: clone engine, 64 threads / 16 in flight, pu70-ccc, avx-512, 5000 events:

time ./mkFit --cmssw-n2seeds --input-file ../../mictest/mkFit/pu70-ccc-hs.bin --build-ce --num-events 5000 --num-thr 64 --num-thr-ev 16

= devel

Total event loop time 17.07450 simtracks 49338285 seedtracks 10275105 builtcands 33905375 maxhits 6473 on lay 5
real 0m17.211s user 16m14.243s sys 0m33.480s

= hit-sort - no hit copy - no prefetch

Total event loop time 16.56905 simtracks 49338285 seedtracks 10275105 builtcands 33905375 maxhits 6347 on lay 5
real 0m16.692s user 15m51.343s sys 0m31.998s

@kmcdermo
Copy link
Collaborator

kmcdermo commented Apr 5, 2019

@osschar I took look at the diff, but could not figure out exactly what the problem with quality-val is: let's chat today after the meeting/in the afternoon.

@kmcdermo
Copy link
Collaborator

@osschar I presume we still want to merge this? this would need a rebase to use the new-and-improved quality-val from PR #214 (and then a round of benchmarks to see where things are).

@osschar
Copy link
Collaborator Author

osschar commented Apr 18, 2019

This was preliminary, to see what people think and to decide what to do.

I'll rebase to master, ideally even after we merge the short-track-stash.

@kmcdermo
Copy link
Collaborator

short-track-stash has been merged :)

@osschar
Copy link
Collaborator Author

osschar commented Apr 18, 2019

excellente ... too bad somebody gave phi3 a really bad headache ;)

@makortel
Copy link
Collaborator

too bad somebody gave phi3 a really bad headache ;)

Sorry for that...

- Prefetching code in MkFinder functions has been removed.

- Sorting of Hits is optional, see define COPY_SORTED_HITS in
  mkFit/HitStructures.h. Sorting is now OFF by default.
@osschar
Copy link
Collaborator Author

osschar commented May 23, 2019

OK, I removed prefetching code but decided to leave hit-sorting in while we explore strip unpacking and local reco. It is off by default, uncomment #define COPY_SORTED_HITS in HitStructures.h to activate it back.

This branch is now squash-rebased to master.

On to running benchmarks.

@osschar
Copy link
Collaborator Author

osschar commented May 23, 2019

@kmcdermo kmcdermo merged commit 9ccf96c into devel May 24, 2019
@osschar osschar deleted the hit-sort branch June 21, 2021 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants