Candidate ranking à la CMSSW, modified to maximize efficiency and minimize fake rate #173

mmasciov · 2018-10-11T19:29:23Z

This PR is parallel to PR #167
In fact, it adopts CMSSW ranking, but modifies parameters depending on the candidates pT and pseudo-rapidity, to maximize efficiency and minimize fake rate.
This happens at the price of some time performance, which may anyways be recovered with better parallelization/vectorization of the code (?).

Original ranking: https://mmasciov.web.cern.ch/mmasciov/benchmarks_originalranking_250evts/
CMSSW ranking ( PR #167 ): https://mmasciov.web.cern.ch/mmasciov/benchmarks_cmsswranking_250evts/
Modified CMSSW ranking (this PR): https://mmasciov.web.cern.ch/mmasciov/benchmarks_cmsswranking_mod_250evts/

slava77 · 2018-10-11T19:35:11Z

Track.h

+  float score[2] = {0.f,0.f};
+  for(int c=0; c<2; ++c){ 
+    // For high pT central tracks: double valid hit bonus
+    if((pt[0]+pt[1])/2.0f > 2.0f && (eta[0]+eta[1])/2.0f < 1.5f){


is there a good reason for the score of one candidate to depend on the other candidate?

Despite the nice improvement in the efficiency in favoring this over PR #167 , I agree with Slava. This is a bit strange: why not just define a function for this section, calling it once for cand1 and again for cand2? That way you can evaluate the score bonuses based on pT/eta for each candidate independently?

with this coupled dependency in the score definition, I'm trying to think if it's obvious that for A, B, C

if (score_A(A,B) > score_B(A,B)) and (score_B(B,C) > score_C(B,C))

that score_A(A,C) will be > score_C(A,C).

I agree that this is not obvious.
However, my choice was based on observation, and in order to avoid "threshold" effects that may favor one candidate instead of another.
If I do the simple way (which was my first attempt), I get normally lower efficiencies:

So, despite it's not obvious as Slava points out that if A wins over B, and B wins over C, then A wins over C, I simply went for the best option in terms of efficiency.

kmcdermo · 2018-10-11T19:48:09Z

@mmasciov : this is great (although I have to agree with Slava that the dependency on candidate scoring is a bit weird)!

Being parallel though to #167 , we would merge one and not the other correct? It looks like the commits have diverged and we could not put one on top of the other.

Or, perhaps we just add both functions for scoring, with different names, and just decide to pick one or the other (and pass around eta for the first scoring proposal).

Anyway, so it is in one place, here's a direct comparison of the simval:

where, 167 does a great job of recovering the transition region, and 173 does a better job of picking up the endcaps. Usually, I am in favor of efficiency to the hit in performance (as it means we are running longer reconstructing tracks), but these are pretty big hits:

SKL-SP Time vs nTH :

although, if 173 was made when @osschar was running his tests, these could have been interfering with each other.

mmasciov · 2018-10-31T13:52:51Z

Coming back to this PR.
I have implemented a seed-based ranking, as we agreed I would attempt to.
Such solution is actually working fine, and I get consistent results wrt. my previous implementation.
Actually, time performance is slightly better than the previous instance of such PR.
Here's the updated benchmark:
https://mmasciov.web.cern.ch/mmasciov/benchmarks_cmsswranking_mod_seedbased_250evts/
The benchmark results from the previous instance of this PR can be still found at:
https://mmasciov.web.cern.ch/mmasciov/benchmarks_cmsswranking_mod_250evts/

Note: previously, I was making use of the eta of the candidates for the ranking.
Now, this not needed anymore.
So, I could remove the eta from everywhere (in the ranking-related code), or I can keep it there, if people think that it may be useful in the future, for any reason (if one wants to make use of it in the ranking at some point).

If there are no comments, I'll remove eta, and then resolve the conflicts.

mmasciov added 4 commits September 20, 2018 05:44

cmsswranking

d11e4e9

Fixing indentation

f4741d6

Avoiding useless multiplications in initializations

471360e

Modified cmssw ranking to maximize efficiency and minimize fake rate

0afe9dd

slava77 reviewed Oct 11, 2018

View reviewed changes

kmcdermo mentioned this pull request Oct 12, 2018

Candidate ranking à la CMSSW #167

Merged

Seed based candidate ranking

03ab24e

mmasciov mentioned this pull request Oct 31, 2018

Modified CMSSW ranking (seed-based, no conflict) #182

Closed

mmasciov closed this Oct 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Candidate ranking à la CMSSW, modified to maximize efficiency and minimize fake rate #173

Candidate ranking à la CMSSW, modified to maximize efficiency and minimize fake rate #173

mmasciov commented Oct 11, 2018

slava77 Oct 11, 2018

kmcdermo Oct 11, 2018

slava77 Oct 11, 2018

mmasciov Oct 12, 2018

mmasciov Oct 12, 2018

kmcdermo commented Oct 11, 2018

mmasciov commented Oct 31, 2018

Candidate ranking à la CMSSW, modified to maximize efficiency and minimize fake rate #173

Candidate ranking à la CMSSW, modified to maximize efficiency and minimize fake rate #173

Conversation

mmasciov commented Oct 11, 2018

slava77 Oct 11, 2018

Choose a reason for hiding this comment

kmcdermo Oct 11, 2018

Choose a reason for hiding this comment

slava77 Oct 11, 2018

Choose a reason for hiding this comment

mmasciov Oct 12, 2018

Choose a reason for hiding this comment

mmasciov Oct 12, 2018

Choose a reason for hiding this comment

kmcdermo commented Oct 11, 2018

mmasciov commented Oct 31, 2018