-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: split up some parts of flowgraph.cpp #47072
JIT: split up some parts of flowgraph.cpp #47072
Conversation
cc @dotnet/jit-contrib ... I could take this further. |
Great to get 25% code size reduction! |
@AndyAyersMS do you have any insights if any of these files can be transformed into separate classes easily?
I would appreciate this. contributes to #13359 |
I suspect the ones that are phase-like (EH opts) should not be too hard to transform, but if they still densely interact with the compiler instance then I'm not sure much is gained. We should think more about what we'd hope to achieve by creating these kinds of classes, beyond having them in a separate namespace of sorts. |
Here's one attempt at logically grouping the remainder of the methods. I suspect we may want to combine some of these. I don't know how many LOC are in each group yet; will try to add that subsequently. analysisfgDominate optimizationfgRemoveUnreachableBlocks fgOptimizeBranchToEmptyUnconditional fgUpdateFlowGraph statementsfgInsertStmtAtBeg fgNewStmtFromTree fgRemoveStmt maintenancefgEnsureFirstBBisScratch fgChangeSwitchBlock fgSplitBlockAtEnd fgUnlinkBlock fgConnectFallThrough fgRenumberBlocks fgEhAllowsMoveBlock fgRelocateEHRange fgNewBBbefore fgFindInsertPoint fgNewBBinRegion pred listsfgGetPredForBlock fgComputeCheapPreds gc pollsblockNeedsGCPoll constructionfgInit importingfgImport inliningclass FgStack hot/cold, funcletsfgInDifferentRegions fgInsertFuncletPrologBlock fgDetermineFirstColdBlock late exception flowacdHelper miscfgCanSwitchToOptimized fgBlockContainsStatementBounded fgNSuccsOfFinallyRet fgIsForwardBranch fgMightHaveLoop fgIsBetterFallThrough fgUseThrowHelperBlocks ehfgClearFinallyTargetBit fgExtendEHRegionBefore fgCheckEHCanInsertAfterBlock edge weightssetEdgeWeightMinChecked fgPrintEdgeWeights (should go in fgdump) sync methods, pinvokefgGetCritSectOfStaticMethod phasesfgExpandRarelyRunBlocks return mergingfgMoreThanOneReturnBlock loopsfgLoopCallTest things that should be gentreefgIsThrow fgChkLocAllocCB fgCheckCallArgUpdate things that should be in lclvarsfgLclFldAssign things that should be importerfgGetStaticsCCtorHelper fgOptimizeDelegateConstructor (?) |
For easier reviewing, could you confirm if you have just moved the code into new files or were there any edits in them? |
I like the idea of breaking up flowgraph.cpp further if it can be logically done (the initial effort years ago to remove fgMorph* from flowgraph was a simple start). However, I think we need to do it "all at once" (or nearly so), and soon (very early in the product cycle). It breaks function history, which is pretty annoying, but probably worth it if the result it easier to work with. It should be very simple to find something you're looking for, so breaking things up "too much" can also be counter-productive. |
With these classes, when they are not added as compiler friends, it is easier to control their individual responsibilities and avoid situations that happened with morph that is called from all different places without any control. |
I like a lot of the classification you list above. It makes sense to me to break into separate files:
|
btw, I like the work done so far in this PR. The only question is whether to do more function extraction from flowgraph.cpp, or leave it as currently proposed. |
The notes over in #13359 suggest we should try to keep source files under 400K of text, so perhaps further splitting makes sense (flowgraph.cpp went from ~980K -> 750K with the above). Maybe 2-4 more spinoff files would be needed to accomplish this? So some sort of grouping of the above. I'll create a tool to estimate and split up the code and come back with an updated proposal. |
Ok, tool is more or less working. Detailed analysis here: https://gist.github.com/AndyAyersMS/5aee6165808e123f1c97f19f30858837 Summary
|
Here's one take on how to split the above. It would create 7 new files, none bigger than 5K lines or so. I would not move the gentree-related stuff since gentree is itself already 19K lines and likely should also be split up. We can iterate on this; I'll use the tool to do the actual splitting. Will post an updated PR with tool generated splits shortly. Feel free to suggest alternate groupings, better names, etc.
|
Hmm, looks like there is a bit more coupling around the |
f01089f
to
da9c1e8
Compare
File sizes with second iteration:
|
Also note all the methods are in the same order in their respective files as they were before; I could (with some extra work) group methods by category, but would have to update how my rewriting tool tracks ifdef state. Gist updated with new classification, and a copy of the splitting program. |
Should it be "if (text[j].StartsWith("#if"))" to track both "if defined" and "ifdef"? In general, I like the current splitting. |
Could be, yes ... at this point the automated split is close enough that I'm doing manual repairs. |
Linker test failures are quite likely unrelated. Think this is more or less ready for review. cc @dotnet/jit-contrib |
Misc. comments:
It's a little hard to review something like this. I really like the split. We'll probably realize more improvements that could be made after living with it. |
For file names, how about: - fgBasic.cpp
- fgCheckDump.cpp
- fgEhOpt.cpp
- fgInfo.cpp
- fgInline.cpp
- fgOpt.cpp
- fgProfile.cpp
+ fgbasic.cpp
+ fgdiagnostic.cpp
+ fgehopt.cpp
+ fginfo.cpp
+ fginline.cpp
+ fgopt.cpp
+ fgprofile.cpp
I think some sort of split here is a good idea; otherwise the merged file is ~300K bytes. If there are specific "info" methods you don't think belong, feel free to point them out. |
@dotnet/jit-contrib ping |
What is your "mental model" for what goes in basic vs. info? By the name, I would assume basic includes functions that construct and modify the flow graph, including preds. But "info" contains a bunch of fgInsert* and fgNew* functions, creation of cheap preds and normal preds. Some things that seem like "info" would be fgGetNestingLevel, fgIsThrow, fgIsIndirOfAddrOfLocal, fgAddrCouldBeNull (seems like this should be on GenTree), fgIsBetterFallThrough. There aren't many "pure" functions, which is what I was thinking "info" would contain. |
Currently info contains these categories:
I see your point; arguably only Perhaps there's some other more fitting name for this group? |
I would put Predecessors/Successors/Queries in basic, and leave Statements/Gentree in flowgraph.cpp. Or, put Predecessors/Successors in a new "fgflow.cpp", Queries in basic.cpp, and Statements/Gentree in flowgraph.cpp. (Maybe that's just punting the flowgraph.cpp because it's hard to think of a sufficiently large subset with tight cohesion?). I suppose there could be fgstmt.cpp -- if ~500 lines is "big enough" to warrant its own file. I think there's a risk of "too much" splitting unless there is strong cohesion to the split pieces. Comments? |
I don't mind splitting more, which is why I orginally had the finer-grained categories. So how about:
|
That looks good to me. |
That would give us
|
Had to move helper method |
Need to fix an ifdef... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it!
@dotnet/jit-contrib here's one last chance to weigh in on this ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Move code into files organized by purpose:
This reduces the size of flowgraph.cpp by about 25%, but it is still over 20K lines.
More splitting is potentially warranted, eg splitting out analyses, optimizations,
inlining support, etc.