Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize MP for CMSSW CPU resources #351

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Optimize MP for CMSSW CPU resources #351

wants to merge 4 commits into from

Conversation

bryates
Copy link
Contributor

@bryates bryates commented Dec 9, 2024

This PR addresses an issue where CMSSW wastes CPU cycles for emptyUnit and nearFull3Unit. They are not loaded in static classes, and should only be initialized once.

@bryates bryates requested review from tomalin, aehart and aryd December 9, 2024 20:07
@@ -28,9 +28,28 @@ static bool nearFullUnitBool(ap_uint<kNBitsBuffer> rptr, ap_uint<kNBitsBuffer> w
return wptr1==rptr || wptr2==rptr;
}

template<int kNBitsBuffer>
class nearFull3Class {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMS coding rules state that class names should start with a capital letter, whereas variable and function names should start with a lower case letter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't get to these sooner. I've changed the names. Let me know if I missed anything.

@@ -80,6 +99,22 @@ static bool nearFull4UnitBool(ap_uint<kNBitsBuffer> rptr, ap_uint<kNBitsBuffer>
return wptr1==rptr || wptr2==rptr || wptr3==rptr || wptr4==rptr;
}

template<int kNBitsBuffer>
class emptyUnitClass {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMS coding rules state that class names should start with a capital letter, whereas variable and function names should start with a lower case letter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -282,7 +282,9 @@ inline void step(const VMStub<VMSType> stubmem[4][1<<(kNbitsrzbinMP+kNbitsphibin
#endif

inline void set_empty() {
empty_ = emptyUnit<MatchEngineUnitBase<VMProjType>::kNBitsBuffer>()[(readindex_,writeindex_)];
static const emptyUnitClass<MatchEngineUnitBase<VMProjType>::kNBitsBuffer> EmptyUnitClass;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMS coding rules state that class names should start with a capital letter, whereas variable and function names should start with a lower case letter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -291,7 +293,9 @@ inline void step(const VMStub<VMSType> stubmem[4][1<<(kNbitsrzbinMP+kNbitsphibin
}

inline bool nearFull() {
return nearFull3Unit<MatchEngineUnitBase<VMProjType>::kNBitsBuffer>()[(readindex_,writeindex_)];
static const nearFull3Class<MatchEngineUnitBase<VMProjType>::kNBitsBuffer> NearFull3Class;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMS coding rules state that class names should start with a capital letter, whereas variable and function names should start with a lower case letter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@tomalin
Copy link
Contributor

tomalin commented Dec 18, 2024

This looks good to me, aside from the minor style comments above. We've not been able to test the CPU performance, as recent changes to the HLS L1Trk repo broke the Future SW again, but logically it should fix the problem.

@aehart
Copy link
Contributor

aehart commented Jan 14, 2025

This looks good to me, aside from the minor style comments above. We've not been able to test the CPU performance, as recent changes to the HLS L1Trk repo broke the Future SW again, but logically it should fix the problem.

This seems like an improvement, but is there a reason for not simply declaring the existing LUTs as static, instead of introducing these wrapper classes? For example, this PR does what I'm suggesting for one of the existing LUTs (not sure why this is only done for one of them):

static ap_uint<(1 << (2 * kNBitsBuffer))> lut;

The only thing missing would be logic to prevent reinitializing the LUT after the first call.

As it stands, multiple objects of these new wrapper classes can be declared, each of which will initialize the exact same LUT. In fact, I guess this happens, since objects of these classes are declared in MatchEngineUnit, of which there are multiple objects declared in MatchProcessor. Declaring the existing LUTs as static would prevent this.

@bryates
Copy link
Contributor Author

bryates commented Jan 14, 2025

This looks good to me, aside from the minor style comments above. We've not been able to test the CPU performance, as recent changes to the HLS L1Trk repo broke the Future SW again, but logically it should fix the problem.

This seems like an improvement, but is there a reason for not simply declaring the existing LUTs as static, instead of introducing these wrapper classes? For example, this PR does what I'm suggesting for one of the existing LUTs (not sure why this is only done for one of them):

static ap_uint<(1 << (2 * kNBitsBuffer))> lut;

The only thing missing would be logic to prevent reinitializing the LUT after the first call.
As it stands, multiple objects of these new wrapper classes can be declared, each of which will initialize the exact same LUT. In fact, I guess this happens, since objects of these classes are declared in MatchEngineUnit, of which there are multiple objects declared in MatchProcessor. Declaring the existing LUTs as static would prevent this.

@tomalin suggested the wrapper class to prevent CMSSW from recomputing the LUTs for every event. Things were set to static before and that didn't seem to be enough.

@aehart
Copy link
Contributor

aehart commented Jan 14, 2025

This looks good to me, aside from the minor style comments above. We've not been able to test the CPU performance, as recent changes to the HLS L1Trk repo broke the Future SW again, but logically it should fix the problem.

This seems like an improvement, but is there a reason for not simply declaring the existing LUTs as static, instead of introducing these wrapper classes? For example, this PR does what I'm suggesting for one of the existing LUTs (not sure why this is only done for one of them):

static ap_uint<(1 << (2 * kNBitsBuffer))> lut;

The only thing missing would be logic to prevent reinitializing the LUT after the first call.
As it stands, multiple objects of these new wrapper classes can be declared, each of which will initialize the exact same LUT. In fact, I guess this happens, since objects of these classes are declared in MatchEngineUnit, of which there are multiple objects declared in MatchProcessor. Declaring the existing LUTs as static would prevent this.

@tomalin suggested the wrapper class to prevent CMSSW from recomputing the LUTs for every event. Things were set to static before and that didn't seem to be enough.

I see that the existing LUT functions have static return types. However, my understanding is that in C++, this is meaningless, more or less (kind of like having a const return type). You have to declare the actual object within the function as static in order for this keyword to have an effect, as you do in the line I referenced above.

@bryates
Copy link
Contributor Author

bryates commented Jan 14, 2025

Good point. I had tried so many variations that didn't compile. Let me double check if that works.

@bryates bryates mentioned this pull request Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants