-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC][LV] VPlan-based cost model #67647
Conversation
Directly compute the cost of a VPlan after construction and track it together with a plan. This allows moving selecting the best VF to the planner. This seems to be a good fit anyways, and removes code from the cost-model that is not directly related to assigning costs to a specific plan/VF. Later this can be swapped out with computing the cost for a plan directly. This may help to simplify D142015. Differential Revision: https://reviews.llvm.org/D143938
This patch follows D89322 to add an initial skeleton of vplan-based cost model. This difference is that instead of incorporating a cost() interface to VPRecipes, all cost implementations are put together in VPlanCostModel. This allows VPlanCostModel to concentrate on assigning costs to vplan, thus seprating the cost model code from the vplan IR, similar to LLVM IR cost modeling. During the transition, it will still use the legacy model to obtain cost until all cost calculation for recipes are implemented. Please let me know if you agree with the main idea of this patch. If there is a general consensus, I'll proceed to implement the cost for the other recipes for review. Differential Revision: https://reviews.llvm.org/D158716 - Address comments - Move VPCM object outside of the loop - Add getElementType() and getReturnElementType()
1ba8548
to
edc764b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General idea seems sound, but this needs tests to convince me that LV behavior doesn't change with VPlan-based costing.
@@ -0,0 +1,71 @@ | |||
//===- SiFive_VPlanCostModel.cpp - Vectorizer Cost Model ------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/SiFive_//
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still wrong headers
Indeed. Similar comment on the original Phab thread. Needs existing tests with flag to not change behaviour based on new cost model. |
Thanks for sharing this as PR, it is great to see people interested in pushing in that direction! Some high-level thoughts:
I tried to sketch an alternative structure in #67934 which addresses some of those points. It consists of a type-inference for VPValues as build block. It only implements costs for Before filling out all the cost implementations, I think we should converge on the overall structure. This PR already implements costs for several recipes, which is great as hopefully we will be able to fill in the gaps soon! |
I agree that using the legacy cost model during the transition period is a prudent approach to ensure stability, About computing cost in the recipe itself, I see your point. Do you want to get your PR reviewed and merged first? Here's hoping we can get those implementations filled in soon! |
I think it would be good to make sure we get the right foundation in first. I created a separate PR for the type analysis only to start with: #69013
I think it makes sense to consider those additional cases once we get to it. |
I agree, let's focus on the migration for now. |
@fhahn class VPlanCostModel {
virtual int64_t getCost(VPlanPtr);
virtual int64_t getCost(VPInstruction *VPInst);
...
}; downstream compiler, if it wants to have its own implementation may override some of these functions: class VPlanCostModelDownstream : public VPlanCostModel {
int64_t getCost(VPInstruction *VPInst) override;
...
}; with minor change when VPCM is constructed. btw, do you know why |
@fhahn ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General idea seems sound, but this needs tests to convince me that LV behavior doesn't change with VPlan-based costing.
Indeed. Similar comment on the original Phab thread. Needs existing tests with flag to not change behaviour based on new cost model.
The cost model has moved inside the VPlan infrastructure and is no longer called from the same place in the same way. Please provide evidence that, even without the vplan-based cost model flag, the behaviour of the vectorizer hasn't changed.
Performance numbers on some benchmarks, traces on the test-suite showing what gets vectorized and what doesn't, etc.
If this is just about the migration, then it should behave in the same way as before without the optional flag.
@@ -0,0 +1,71 @@ | |||
//===- SiFive_VPlanCostModel.cpp - Vectorizer Cost Model ------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still wrong headers
@@ -370,6 +388,25 @@ class LoopVectorizationPlanner { | |||
void adjustRecipesForReductions(VPBasicBlock *LatchVPBB, VPlanPtr &Plan, | |||
VPRecipeBuilder &RecipeBuilder, | |||
ElementCount MinVF); | |||
|
|||
/// Returns true when Factor A is more profitable than Factor B. | |||
bool isMoreProfitable(const VectorizationFactor &A, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have guessed this is part of the cost model, not the planner.
assert(ExpectedCost.isValid() && "Unexpected invalid cost for scalar loop"); | ||
assert(VFCandidates.count(ElementCount::getFixed(1)) && | ||
"Expected Scalar VF to be a candidate"); | ||
bool LoopVectorizationPlanner::isMoreProfitable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since working with VPlan is cheaper and we'll be doing a lot more cost model checks and we'll be comparing the same VPlan against other VPlans, I suggest we cache the cost of VPlans and only compute the ones that haven't been done yet.
Otherwise we'll be re-running the cost checks across the same VPlans over and over.
At the moment, computing costs is framed in terms of the IR instructions we generate (i.e. what In general, I think
I think the original motivation was to have codegen for each recipe in the recipes directly, as it is strongly tied to the semantics of the recipe and ensure that it is easy to see what the implementation of the recipe looks like in terms of codegen.
Finally had time to get #67934 updated to a version that has the VPlan-based and legacy cost-models agree on the VF for large code-bases (llvm-test-suite, SPEC2017, Clang bootstrap) and multiple configurations (AArch64 with/wo SVE, with/wo |
Is there a list of benchmarks we should care about for the community ? Spec2k6, tsvc ? |
After my recent update to #67934, I think we should be able to gradually migrate to the VPlan-based cost-model on a per-recipe basis, starting with As I mentioned earlier, with #67934 , the chosen VFs agree with the legacy cost model for a large test set (llvm-test-suite (includes tsvc and many others) , SPEC2017, clang bootstrap, some large internal projects on multiple platforms and configs, more details in my previous comment #67647 (comment)). There's no pre-defined set of test suites I think, but the ones I mentioned should cover a lot of ground. |
Not sure I follow TTI reason.
Similar goes to
Yeah, I can understand why "downstream has to deal with it" is beneficial for upstream. My biggest worry that with a variety of different hws, that might become a big concern for everyone in a long run.
Agree, that sounds like a reasonable view. At the same time recipe itself has a well-defined semantics and there could be few ways to generate code for it. |
Indeed, this isn't about performance versus each other, but making sure the cost model choices aren't different by looking at the debug traces while running over those programs. They are, so we're good. |
I'm inclined to proceed with your approach outlined in #67934. However, I believe it's important to address the ongoing discussion in this thread regarding the potential separation of the cost model from VPlan. @nikolaypanchenko's observation on TTI being caller-agnostic highlights the need for callers to estimate additional context-related information. |
With regards to TTI, recipes are defined in terms of IR instructions they generate, so defining the cost of the recipe next to the place where code is generated seems naturally (and I think this also matches the original intended design). We need some state, but that should be only limited to TTI and some analysis like type inference eventually
I also mention this in the response below, but I think the first step doesn't mean that everything needs to fit the initial structure in the long run. Once there's a concrete use case that benefits from a different structure we can adjust, while making sure that there's a clear improvement to the current state that's testable.
The initial VPlan-based cost model should IMO be solely focused on evaluating the cost per recipe, as this directly maps to the generated IR for which we can query cost. One of the design goals is to model concepts explicitly in VPlan, to avoid having the look across recipes, regions or blocks to compute an accurate cost for any given recipe (e.g. modeling interleave groups explicitly). At the moment, the only concrete example I know of where that is not possible at the moment is arithmetic vector instructions that also can do a sign/zero-extend of some of the operands. Currently TTI tries to solve this via looking at users of a given instructions to determine if a cast is free or not. This won’t work with VPlan, as TTI cannot traverse VPlan’s def-use chains (nor should it IMO). To address this, this could be modeled by a transform that replaces zext/add recipes with a widening add recipe for targets that have that capability. There will be other analysis that are not interested in the cost of a single recipe, but other metrics, e.g. resource usage or register pressure when deciding on the interleave count. But those seem to be completely separate from assigning a cost per instruction and could be done completely separately. Switching to a VPlan-based cost-model is likely to cause some disruption if we aren’t careful, so I think to start with we should focus on moving forward with targeted steps, with everything we bring up being used and tested by default. If in the future there are concrete cases where we could benefit from a different structure in terms of where cost is computed and how, we should re-evaluate once we are at this point. |
This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555
Isn't it more about code placement ?
I still not sure I understand what's a big no-no of a dedicated class to evaluate the cost ? What's a big yes-yes for your proposed approach ?
There're multiple cases where "look across recipes" are needed. The most important one, which already exists in some form, is estimation of register pressure.
I'd say it's fine, but since your plan is to introduce
I do think something like that needs to have a broader discussion, not only between couple of people. Tuesday meeting seems to be the right one. |
Sure, in the end the whole discussion boils down to code placement.
I don't think there's a big yes-yes or no-no for either. I think the most important bit is getting started on transitioning the cost computations without regressions; using overloading is probably slightly more compact than having a separate cost model class with some kind of type switch initially, but I don't feel strongly about it. I mostly have been trying to better understand the benefit/desire to have it separate.
Is this the case in the current cost-model for instruction costs? We try to estimate the register pressure when picking the interleave factor, but that's completely separate from computing instruction costs at least at the moment?
I'm not sure why that would be the case? It only contains global info needed for TTI queries + state that's used temporary during the transition from the legacy CM which unfortunately seems needed during the initial bringup. |
Understood. Having a proper separation of concerns, I mentioned earlier is a big reason to me (otherwise it's not clear what should be a part of VPlan/Recipe and what should not be: should transforms and verification be a part of VPlan ?)
That's exact the reason internally we added vplan-based cost model, since current regpressure heuristic is only used to trim down VF/IF candidates, but is not used later during loop estimation. Without it loop with multiple live registers and zexts was assumed profitable.
If we do consider basic estimation only, entire debate from my side is "tomato-tomato": in both cases object needs to be constructed:
vs
but when it's become essential to know context around instruction, I do see that proper separation is essential. |
This reverts commit 46080ab. Extra tests have been added in 52d29eb. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555
Would you be able to share an IR example for the ZExt issue? It would be interesting to understand if it is similar to the AArch64 issue and in particular with Ext/other opcode combinations are involved. (I recommitted 90fd99c to get a additional signal into cases where it diverges so I can fix those in parallel to wrapping up the discussion re how to exactly structure the code) |
Sure: https://gist.github.com/nikolaypanchenko/41279c3330d6744cec22eda571fe9a4d |
This reverts commit 6f538f6. Extra tests for crashes discovered when building Chromium have been added in fb86cb7, 3be7312. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555
This reverts commit 6f538f6. Extra tests for crashes discovered when building Chromium have been added in fb86cb7, 3be7312. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555
This reverts commit 6f538f6. A number of crashes have been fixed by separate fixes, including ttps://github.com//pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555
This reverts commit 6f538f6. A number of crashes have been fixed by separate fixes, including ttps://github.com/llvm/pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555
This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555
As #92555 is merged, this can be closed. |
This patch follows D89322 to add an initial skeleton of vplan-based cost model.
This difference is that instead of incorporating a cost() interface to VPRecipes,
all cost implementations are put together in VPlanCostModel.
This allows VPlanCostModel to concentrate on assigning costs to vplan,
thus seprating the cost model code from the vplan IR, similar to LLVM IR cost
modeling.
Please let me know if you agree with the main idea of this patch.
If there is a general consensus, I'll proceed to implement the cost for the
other recipes for review.
Differential Revision: https://reviews.llvm.org/D158716