Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove normalisation by area in seeding #468

Conversation

echapon
Copy link

@echapon echapon commented Apr 12, 2021

PR description:

Investigation of a modified seeding for clusters: removing normalisation by area in HGCalHistoSeedingImpl. First discussed in this HGCal TPG meeting

[WORK IN PROGRESS]

PR validation:

Work in progress

Before submitting your pull requests, make sure you followed this checklist:

@echapon
Copy link
Author

echapon commented Jun 15, 2021

@jbsauvan here are a few plots (all including offset correction and JEC on reconstructed jets) further investigating the proposed changes, depending on the new threshold for seed selection:

  • number of seeds (also vs eta and energy)
  • number of 3D clusters (also vs eta and pt)
  • jet efficiency
  • "turn-on" curves
  • rates vs pt (nu gun PU200)
    The modified seeding seems to bring a higher efficiency for jets (except for the highest threshold), flatter as a function of eta, and a smaller rate, at the cost of a larger number of seeds and 3D clusters. Before further tuning the threshold, it may be useful to check the rough budget on the number of seeds or 3D clusters.

@echapon
Copy link
Author

echapon commented Jul 7, 2021

Studies presented at the HGCAL TPG meeting on June 23 and July 7.

@jbsauvan PR updated (ran scram b code-checks and scram b code-format; default changed to new modifed seeding with threshold=20), I think you can start reviewing.

Copy link

@jbsauvan jbsauvan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @echapon
In addition to the comments, could you rebase this PR on top of hgc-tpg-devel-CMSSW_12_0_0_pre3.

Comment on lines 161 to 163
int bin1_10pct_;
float R1_10pct_;
float R2_10pct_;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three variables are only used in the constructor, so could be local variables instead of class members

@@ -64,6 +65,12 @@ HGCalHistoSeedingImpl::HGCalHistoSeedingImpl(const edm::ParameterSet& conf)
<< "Inconsistent size of neighbour weights vector in HGCalMulticlustering ( " << neighbour_weights_.size()
<< " ). Should be " << neighbour_weights_size_ << "\n";
}

// compute quantities for non-normalised-by-area histoMax
bin1_10pct_ = (int)0.1 * nBins1_;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason to choose a reference bin area 10% away from the minimal r/z? Or is this arbitrary? It just changes the scale of the threshold, so that's not critical, but it may be useful to add a comment in the code.

@@ -87,6 +87,7 @@
seeding_space=cms.string("RPhi"),# RPhi, XY
seed_smoothing_ecal=seed_smoothing_ecal,
seed_smoothing_hcal=seed_smoothing_hcal,
seeds_norm_by_area=cms.bool(False)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should keep the current normalization for the moment, until the new normalization can be checked by other people, in particular on electrons.

@@ -105,7 +106,7 @@
# (see https://indico.cern.ch/event/806845/contributions/3359859/attachments/1815187/2966402/19-03-20_EGPerf_HGCBE.pdf
# for more details)
phase2_hgcalV10.toModify(histoMax_C3d_seeding_params,
threshold_histo_multicluster=8.5, # MipT
threshold_histo_multicluster=20, # arb. units (for seeds_norm_by_area=False)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should keep the current normalization for the moment, until the new normalization can be checked by other people, in particular on electrons.

):
parameters_seeding_c3d.nBins_X1_histo_multicluster = nBins_X1
parameters_seeding_c3d.nBins_X2_histo_multicluster = nBins_X2
parameters_seeding_c3d.binSumsHisto = binSumsHisto
parameters_seeding_c3d.threshold_histo_multicluster = seed_threshold
parameters_seeding_c3d.seeds_norm_by_area = seeds_norm_by_area

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in all this file: May be better for the moment to write a specific custom function dedicated to switching the normalization type and keep the existing customs untouched. In particular some of the existing customs are not concerned by this normalization.

@echapon echapon force-pushed the hgcal_seeding_from_11_3_0_pre5 branch from 3e5ad19 to 12e9e54 Compare July 19, 2021 14:23
@echapon
Copy link
Author

echapon commented Jul 19, 2021

hi @jbsauvan , I am confused about how to proceed with this PR given the rebase. Should I create a fresh PR to hgc-tpg-devel-CMSSW_12_0_0_pre3? I get conflicts here with this PR to hgcal_seeding_from_11_3_0_pre5 after rebase, which is probably not unexpected.

@jbsauvan jbsauvan changed the base branch from hgc-tpg-devel-CMSSW_11_3_0_pre5 to hgc-tpg-devel-CMSSW_12_0_0_pre3 July 19, 2021 19:39
@jbsauvan
Copy link

Hi @echapon
I changed the base branch.
Could you remove the two commits 0978f7d and e7df946 , as they are not part of this PR?

@echapon echapon force-pushed the hgcal_seeding_from_11_3_0_pre5 branch from 12e9e54 to 1bf0432 Compare July 20, 2021 12:28
@echapon
Copy link
Author

echapon commented Jul 20, 2021

@jbsauvan done

Copy link

@jbsauvan jbsauvan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @echapon
In addition to the inline comments, could you apply code-format?

@@ -13,7 +13,7 @@ def set_histomax_seeding_params(parameters_seeding_c3d,
nBins_X1,
nBins_X2,
binSumsHisto,
seed_threshold,
seed_threshold

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below in this file: if possible it would be better to keep the formatting of the existing code unchanged.

):
parameters_c3d = histoMaxXYVariableDR_C3d_params.clone()
set_histomax_seeding_params(parameters_c3d, nBins_X1, nBins_X2,
histoMaxXYVariableDR_C3d_params.binSumsHisto,seed_threshold)
process.hgcalBackEndLayer2Producer.ProcessorParameters.C3d_parameters.histoMax_C3d_seeding_parameters = parameters_c3d
return process

def custom_3dclustering_seedNoArea(process,
seeds_norm_by_area=cms.bool(False),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function is meant to remove the norm by area, this shouldn't be a parameter of the function. Otherwise we could call custom_3dclustering_seedNoArea(process, seeds_norm_by_area=True) which would be a bit inconsistent.

Comment on lines 103 to 105
parameters_c3d = histoMax_C3d_seeding_params.clone()
parameters_c3d.seeds_norm_by_area = seeds_norm_by_area
parameters_c3d.threshold_histo_multicluster = seed_threshold

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to set the parameters within the clone():

parameters_c3d = histoMax_C3d_seeding_params.clone(seeds_norm_by_area=False, 
                                                    threshold_histo_multicluster=seed_threshold )

):
parameters_c3d = histoMaxXYVariableDR_C3d_params.clone()
set_histomax_seeding_params(parameters_c3d, nBins_X1, nBins_X2,
histoMaxXYVariableDR_C3d_params.binSumsHisto,seed_threshold)
process.hgcalBackEndLayer2Producer.ProcessorParameters.C3d_parameters.histoMax_C3d_seeding_parameters = parameters_c3d
return process

def custom_3dclustering_seedNoArea(process,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function should return process

@echapon
Copy link
Author

echapon commented Jul 23, 2021

Thanks @jbsauvan , just pushed a new commit accounting for these comments

@jbsauvan
Copy link

Hi @echapon
When running the validation (before the 2 last commits of Today), I see strong discrepancies compared to the devel branch in the clusters distributions using the default config (so the distributions should be unchanged). I'm not sure where this can come from.

e.g. here for the cluster eta (red is the reference and blue is this PR):

image

Could you check whether you see similar things?

@echapon
Copy link
Author

echapon commented Jul 23, 2021

oops... indeed something's wrong. I thought I had checked but let me have another look with the latest state of the code.

@echapon
Copy link
Author

echapon commented Jul 23, 2021

When I run testHGCalL1T_RelValV11_cfg.py out of the box with this PR (all commits included) I get something that looks very much like the red.
However, I seem to be able to reproduce the blue if I add custom_3dclustering_seedNoArea(process,seed_threshold=8.5) where 8.5 is the default threshold with the default settings (i.e. with normalisation by area).

@jbsauvan
Copy link

Thanks @echapon for the cross-check
I reran the validation including this time the two latest commits of Today and it gave me the same distributions as the reference. Not sure what went wrong before.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants