Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Candidate ranking à la CMSSW, modified to maximize efficiency and minimize fake rate #173

Closed

Conversation

mmasciov
Copy link
Collaborator

This PR is parallel to PR #167
In fact, it adopts CMSSW ranking, but modifies parameters depending on the candidates pT and pseudo-rapidity, to maximize efficiency and minimize fake rate.
This happens at the price of some time performance, which may anyways be recovered with better parallelization/vectorization of the code (?).

Original ranking: https://mmasciov.web.cern.ch/mmasciov/benchmarks_originalranking_250evts/
CMSSW ranking ( PR #167 ): https://mmasciov.web.cern.ch/mmasciov/benchmarks_cmsswranking_250evts/
Modified CMSSW ranking (this PR): https://mmasciov.web.cern.ch/mmasciov/benchmarks_cmsswranking_mod_250evts/

Track.h Outdated
float score[2] = {0.f,0.f};
for(int c=0; c<2; ++c){
// For high pT central tracks: double valid hit bonus
if((pt[0]+pt[1])/2.0f > 2.0f && (eta[0]+eta[1])/2.0f < 1.5f){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a good reason for the score of one candidate to depend on the other candidate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the nice improvement in the efficiency in favoring this over PR #167 , I agree with Slava. This is a bit strange: why not just define a function for this section, calling it once for cand1 and again for cand2? That way you can evaluate the score bonuses based on pT/eta for each candidate independently?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this coupled dependency in the score definition, I'm trying to think if it's obvious that for A, B, C

  • if (score_A(A,B) > score_B(A,B)) and (score_B(B,C) > score_C(B,C))
  • that score_A(A,C) will be > score_C(A,C).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is not obvious.
However, my choice was based on observation, and in order to avoid "threshold" effects that may favor one candidate instead of another.
If I do the simple way (which was my first attempt), I get normally lower efficiencies:
eff_averagevssingle_pt0
ratio_singleoaverage_pt0
eff_averagevssingle_pt0p9
ratio_singleoaverage_pt0p9
eff_averagevssingle_pt2p0
ratio_singleoaverage_pt2p0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, despite it's not obvious as Slava points out that if A wins over B, and B wins over C, then A wins over C, I simply went for the best option in terms of efficiency.

@kmcdermo
Copy link
Collaborator

@mmasciov : this is great (although I have to agree with Slava that the dependency on candidate scoring is a bit weird)!

Being parallel though to #167 , we would merge one and not the other correct? It looks like the commits have diverged and we could not put one on top of the other.

Or, perhaps we just add both functions for scoring, with different names, and just decide to pick one or the other (and pass around eta for the first scoring proposal).

Anyway, so it is in one place, here's a direct comparison of the simval:

where, 167 does a great job of recovering the transition region, and 173 does a better job of picking up the endcaps. Usually, I am in favor of efficiency to the hit in performance (as it means we are running longer reconstructing tracks), but these are pretty big hits:

SKL-SP Time vs nTH :

although, if 173 was made when @osschar was running his tests, these could have been interfering with each other.

@mmasciov
Copy link
Collaborator Author

Coming back to this PR.
I have implemented a seed-based ranking, as we agreed I would attempt to.
Such solution is actually working fine, and I get consistent results wrt. my previous implementation.
Actually, time performance is slightly better than the previous instance of such PR.
Here's the updated benchmark:
https://mmasciov.web.cern.ch/mmasciov/benchmarks_cmsswranking_mod_seedbased_250evts/
The benchmark results from the previous instance of this PR can be still found at:
https://mmasciov.web.cern.ch/mmasciov/benchmarks_cmsswranking_mod_250evts/

Note: previously, I was making use of the eta of the candidates for the ranking.
Now, this not needed anymore.
So, I could remove the eta from everywhere (in the ranking-related code), or I can keep it there, if people think that it may be useful in the future, for any reason (if one wants to make use of it in the ranking at some point).

If there are no comments, I'll remove eta, and then resolve the conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants