Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Addition of SoftDrop from the RecursiveTools Contrib #235

Merged
merged 75 commits into from
Aug 2, 2023

Conversation

cmoore24-24
Copy link
Contributor

This PR includes the addition of the SoftDrop grooming tool from the RecursiveTools section of fastjet-contrib. The defaults are set to those used in CMS and/or the defaults recommended by the authors of SoftDrop. That said, the options for both the SymmetryCut and RecursionChoice fields have been included. At the moment, the soft drop grooming outputs a record array of the jet constituents, as well as a few values of interest of the jet itself.

As a small change, EnergyCorrelator has been slightly adjusted, so that the default invocation of the generic ECF class will return the normalized values. The unnormalized values can be returned with a "noramlized=False" argument.

@jpivarski
Copy link
Member

Just a suggestion: instead of returning a set of filtered jets, you could return an array of booleans. On the Python side, you can use the booleans to slice the original jets and get the filtered jets, but you can also use those booleans for other things. You could, for instance, apply logical operators on groomings with different parameters, to see which jets would be groomed with one set of parameters and not another. That would be harder if you had to identify jets in the output set that are "the same jet."

Also if you do input jets → grooming algorithm → boolean array → slice → filtered jets, the filtered jets share memory with the input jets. It's a view, rather than a copy.

@lgray
Copy link
Collaborator

lgray commented Jul 10, 2023

This is a very good suggestion - especial for environments like HL-LHC. @cmoore24-24 do you see how to do this?

@cmoore24-24
Copy link
Contributor Author

cmoore24-24 commented Jul 10, 2023

Is there room for both ideas, potentially? I can definitely see how returning booleans rather than the filtered jets can be helpful when memory is at a premium, but having fastjet do some of the calculation legwork w/r/t the jets is also pretty useful. So maybe keep this implementation as-is, and then down the line have something like exclusive_jets_softdrop_grooming_slimmed or some such that returns the booleans?

As for how to do it, no I'm not really sure off the top of my head. When I have the time to take another look, I can try to figure it out, though.

@lgray
Copy link
Collaborator

lgray commented Jul 10, 2023

You could have one additional kwarg that's "return_as_mask" or have two functions exclusive_jets_softdrop_grooming and exclusive_jets_softdrop_grooming_mask depending on what people need.

@cmoore24-24
Copy link
Contributor Author

cmoore24-24 commented Jul 16, 2023

I've added a way to return a boolean mask that can be applied to a PF candidates array to return filtered jets. As for the implementation; I may be wrong, but I don't believe there's an easy way to make softdrop preserve anything like candidate ID or index (though if there is and I missed it, then feel free to let me know). So the only way I could think to do this was explicit 4 vector matching via nested loops.

for (unsigned int i = 0; i < css.size(); i++){  // iterate through events
          auto jets = css[i]->exclusive_jets(n_jets);
          for (unsigned int j = 0; j < jets.size(); j++){
            auto soft = sd->result(jets[j]);
            nconstituents.push_back(jets[j].constituents().size());
            for (unsigned int k = 0; k < jets[j].constituents().size(); k++){
              for (unsigned int l = 0; l < soft.constituents().size(); l++){
                if (jets[j].constituents()[k].px() == soft.constituents()[l].px() &&
                    jets[j].constituents()[k].py() == soft.constituents()[l].py() &&
                    jets[j].constituents()[k].pz() == soft.constituents()[l].pz() &&
                    jets[j].constituents()[k].E() == soft.constituents()[l].E()){
                  bool_mask.push_back(true);
                  break;
                }
                else if (l == soft.constituents().size()-1) {
                  bool_mask.push_back(false);
                }
              }
            }
          }
        }

As a result, it isn't a terribly quick process, but it does work and produce the desired boolean array.

@cmoore24-24 cmoore24-24 requested a review from lgray July 18, 2023 15:08
@lgray
Copy link
Collaborator

lgray commented Jul 21, 2023

We need to do better than quadruple nested loop, this will die in bad ways for HL-LHC analyses. I'll think about a way forward. Would not merge as is.

@cmoore24-24
Copy link
Contributor Author

We need to do better than quadruple nested loop, this will die in bad ways for HL-LHC analyses. I'll think about a way forward. Would not merge as is.

At this point would it be worthwhile reverting this merge to the pre-mask state and making the addition of the mask a separate PR?

@cmoore24-24
Copy link
Contributor Author

cmoore24-24 commented Jul 28, 2023

Hello,

I have removed the attempt at adding a mask option for softdrop. That work can serve as a starting point for a separate pull request that adds a mask to softdrop at a later date. The PR is now in its initial state; it returns filtered jet constituents, as well as some attributes of the filtered jets themselves.

@jmduarte, do you have any comments on the PR as it is now? Can it be merged as is, or are there changes that need to be made?

Best,
Connor

Copy link
Collaborator

@rkansal47 rkansal47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think this is a really useful PR, thanks! I agree the masking can be left for a future PR. I can merge this week unless there are any further comments @jpivarski @jmduarte @lgray.

tests/test_002-exclusive_jets.py Outdated Show resolved Hide resolved
tests/test_002-exclusive_jets.py Outdated Show resolved Hide resolved
@rkansal47 rkansal47 merged commit e4e7635 into scikit-hep:main Aug 2, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants