Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Candidate ranking à la CMSSW #167

Merged
merged 8 commits into from
Oct 12, 2018
Merged

Conversation

mmasciov
Copy link
Collaborator

@mmasciov mmasciov commented Oct 2, 2018

if(pt[c]<0.9f) score[c] -= 0.5f*(validHitBonus_)*nfoundhits[c];
else if(nfoundhits[c]>8) score[c] += (validHitBonus_)*nfoundhits[c];
}
return score[0]>score[1];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is repeated four times. Would it be possible to reduce the copy-paste, e.g. for those having a Track object calling the sortByScore(const Track&, const Track&)), and here where that is not possible, abstract the score calculation (=for loop) to an inlined function that is then called also from sortByScore()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with Matti here, definitely the less copy-paste the better :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.



float validHitBonus_=2.5;
float missingHitPenalty_=20.0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hardcode the constants here instead of taking them via Config? (same for sortCandByScore() in `MkBuilder.cc)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values are actually already sitting in Config.h. So, these two lines are actually not needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@kmcdermo
Copy link
Collaborator

kmcdermo commented Oct 8, 2018

Hey Mario, I have a few comments:

  1. It looks like the SNB tests failed... And it looks like you managed to escape the crash in STD building with SIMVAL... which makes it hard to compare with the latest baselines. Can you run the benchmarks again (using 100 events for the validation - see Issue Fixing up of SlurpIn #168 diff)?
  2. Or, just for comparisons sake for the validation, can you link an old set of plots (pre this PR)? Or perhaps the slides you presented in the group meeting?
  3. I would agree with Matti that it looks like this code can be refactored -- and the less copy/paste, the easier it is to maintain.

@mmasciov
Copy link
Collaborator Author

Hi Matti, Kevin,

@makortel I have tried to 'welcome' your suggestions, and rearranged the code.
@kmcdermo I don't know why SNB fails, and sometimes KNL too. And many times I get "random" core dumps, either for STD or CE SIMVAL (both in the PR scope, and in the original scope).
I reran both original and PR code with 250 events (running with 100 is sub-optimal, as the improvements may be eaten by statistics):
Original: https://mmasciov.web.cern.ch/mmasciov/benchmarks_originalranking_250evts/
PR: https://mmasciov.web.cern.ch/mmasciov/benchmarks_cmsswranking_250evts/
Still, there are some holes (SNB and/or KNL), but at least on SKL-SP everything was successful, so it's still ok to look and have an apple-to-apple comparison.

Track.h Outdated
int nmisshits[2] = {cand1.first.nTotalHits()-cand1.first.nFoundHits(),cand2.first.nTotalHits()-cand2.first.nFoundHits()};
float chi2[2] = {cand1.first.chi2(),cand2.first.chi2()};
float pt[2] = {cand1.first.pT(),cand2.first.pT()};
return sortByScoreLoop(nfoundhits,nmisshits,chi2,pt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mmasciov. Would return sortByScoreCand(cand1.first, cand2.first); work here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, why not. I have fixed this, and compilation doesn't complain. Thanks.

@kmcdermo
Copy link
Collaborator

@mmasciov , thanks for the cleanups -- looks nice! As for the crashes in the validation: this is a known issue with SlurpIn when running too many events with lots of events in flight.

We can skirt around the crash if we run fewer events in the validation (when using MEIF). OR, if you are concerning about the stats, we can keep the number of events the same, but then disable the MEIF validation:

[macbook] mictest > git diff val_scripts/validation-cmssw-benchmarks.sh
diff --git a/val_scripts/validation-cmssw-benchmarks.sh b/val_scripts/validation-cmssw-benchmarks.sh
index 595f388..9fb8728 100755
--- a/val_scripts/validation-cmssw-benchmarks.sh
+++ b/val_scripts/validation-cmssw-benchmarks.sh
@@ -22,7 +22,7 @@ nevents=500
 ## Common executable setup
 maxth=64
 maxvu=16
-maxev=32
+maxev=1
 seeds="--cmssw-n2seeds"
 exe="./mkFit/mkFit --silent ${seeds} --num-thr ${maxth} --num-thr-ev ${maxev} --input-file ${dir}/${subdir}/${file} --num-events ${nevents}"
 
@@ -67,9 +67,6 @@ function doVal()
     echo "${oBase}: ${vN} [nTH:${maxth}, nVU:${maxvu}int, nEV:${maxev}]"
     ${bExe} >& log_${oBase}_NVU${maxvu}int_NTH${maxth}_NEV${maxev}_${vN}.txt
     
-    # hadd output files for this test, then move to temporary directory
-    hadd -O valtree.root valtree_*.root
-    rm valtree_*.root
     mv valtree.root ${tmpdir}/valtree_${oBase}_${vN}.root
 }              

So, could you try running the validation with 500 events, but turning off MEIF?

@kmcdermo
Copy link
Collaborator

The crashes on SNB are concerning... can you post the logs of one of the SNB tests?

@mmasciov
Copy link
Collaborator Author

As for the crashes with large number of events: 250 events work fine, and it's enough for my purposes.

As for the crashes on SNB: note that they happen even without changing a comma in the code, freshly cloned. I can submit another benchmark, and point do not clean the logs (when I move the benchmarks on lxplus).

@kmcdermo
Copy link
Collaborator

Huh. Okay, well, then that is really concerning. MT's PR also showed crashes on KNL...

@mmasciov
Copy link
Collaborator Author

For SNB, the logs do not help (me) much:

  1. less benchmark_snb_dump.txt
    Executing SNB tests remotely...
    bash: line 1: cd: /data2/nfsmic/mmasciov/tmp: No such file or directory
    bash: line 2: ./xeon_scripts/benchmark-cmssw-ttbar-fulldet-build.sh: No such file or directory
    Copying logs back from SNB for plotting
    scp: /data2/nfsmic/mmasciov/tmp/log_SNB_CMSSW_TTbar_PU70_*.txt: No such file or directory
    Removing tmp dir on SNB remotely

  2. less log_SNB_CMSSW_TTbar_PU70_BH_TH.txt
    grep: log_SNB_CMSSW_TTbar_PU70_BH_NVU8int_NTH24.txt: No such file or directory

@mmasciov
Copy link
Collaborator Author

After looking into this, I think there's an issue with my repository in /data2/nfsmic/:
ls -lrth /data2/nfsmic/
total 68K
-rw-r--r--. 1 root root 16 Aug 15 2014 root-test-file
drwxr-xr-x. 3 root root 4.0K Mar 26 2015 mpss-3.4.3
drwxr-xr-x. 3 root root 4.0K Sep 1 2015 mpss-3.5.2
drwxr-xr-x. 3 root root 4.0K Mar 3 2016 mpss-3.6.1
drwxrwxr-x. 3 31030 31030 4.0K Mar 16 2016 ml15
drwxrwxr-x. 3 slantz slantz 4.0K Apr 14 2016 slantz
drwxrwxr-x. 4 slava77 slava77 4.0K Sep 23 2016 slava77
drwxrwxrwt. 9 root root 4.0K Aug 17 2017 scratch
drwxr-xr-x. 2 31026 31026 4.0K Jan 24 2018 CMS-Geom
drwxrwxr-x. 2 31021 31021 4.0K May 11 20:23 cerati
drwxrwxr-x. 3 31018 31018 4.0K May 15 11:26 dsr
drwxrwxr-x. 2 mkortela mkortela 4.0K Aug 7 03:19 mkortela
drwxrwxr-x. 2 31031 31031 4.0K Aug 13 09:18 mmasciov
drwxrwxr-x. 2 areinsvo areinsvo 4.0K Sep 17 07:35 areinsvo
drwxr-xr-x. 12 matevz matevz 4.0K Oct 11 17:05 matevz
drwxrwxr-x. 3 kmcdermo kmcdermo 4.0K Oct 11 21:45 kmcdermo
drwxrwxr-x. 3 mmasciov mmasciov 4.0K Oct 12 05:50 mmasciovtc

I have created a new repository (mmasciovtc), and now things appear to be working.
I don't know who the owner of the mmasciov dir is (31031), but it seems to be messed up.

@kmcdermo
Copy link
Collaborator

Ah , yes after the update to phi1 , usernames in the data directories got all screw up. @osschar, can you fix Mario's directories on phi1 and phi2?

@srlantz
Copy link
Collaborator

srlantz commented Oct 12, 2018 via email

@kmcdermo
Copy link
Collaborator

thanks Steve!

@kmcdermo kmcdermo merged commit a535d3a into trackreco:devel Oct 12, 2018
@kmcdermo
Copy link
Collaborator

As discussed in today's meeting, we decided that this is a definite need, so we merged it. However, two points remain:

  1. What to do about PR Candidate ranking à la CMSSW, modified to maximize efficiency and minimize fake rate #173: is there a better way to decouple the score of two candidates based on pT / eta? @mmasciov said he will look into some checks.
  2. @osschar mentioned that the current sorting is quite heavy -- done once per candidate: would be better to store the score per candidate, then sort on that (with some extra indices). No volunteers as of yet.

@cerati
Copy link
Collaborator

cerati commented Oct 12, 2018 via email

@mmasciov
Copy link
Collaborator Author

@cerati Yes, but first I'll need to find a good "compromise" for PR #173 , as currently each candidate in PR #173 can have multiple scores, depending on the "other" candidate.
So, I'd first proceed "fixing" PR #173 , then use @osschar 's suggestion on top.

@cerati
Copy link
Collaborator

cerati commented Oct 12, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants