Description
I suspect everything I suggest here isn't particular new or ground breaking and might in fact be what's already intended as the direction. I know IC/Packed AST and revisiting code will naturally encourage parts of it. I'm merely writing it down as presently I don't see this type of refactor discussed.
I'll start by presenting the initial idea, if on principle it sounds like it's close, then start with very narrow changes to see what they actually look like. Then revise this as more is learned and plans change.
Initial Idea
I've read #5 but this is a broader change to the sem layer. I'll use semexprs.semExpr
code's handling of nkCall, nkInfix, nkPrefix, nkPostfix, nkCommand, nkCallStrLit:
as a motivating example. It's used in a number of areas and for surprising things, it's broken in subtle and difficult to debug ways.
One such surprising thing is the change in nim-lang/Nim#16840 causes js/t11166.nim
to fail as the experimental dot operator macro ends up recursing in a cycle trying to apply the dot operator to broken code. The dot operator return type is made up of undefined symbols, when it goes to resolve it semTypeExpr
delegates to semExpr
, which delegates to directly or transitively to qualifiedLookup
, semDirectOp
, builtInFieldAccess
, etc... each of these keep establishing some ground turth, then call a delegate throwing away much of that analysis. There is no source of truth and the same issue needs to be handled all over the place, but carefully as existing context is unclear.
That pattern of some proc in sem learning a lot internally, and then calling another proc that either doesn't take into account (ommission bug) or regenerates that information (inconsistent/multiple source of truth) makes informal reasoning rather difficult.
Back to the example, semTypeNode is called by the semProcAux
which knows that expression should provide a return type ultimately, but that gets lost especially by the time we hit semExpr
and friends. Since they're all rather general it's now difficult to know if any changes are safe as one needs to reason about far more cases than before and it's unclear what will be impacted.
Starting Guiding Principles
I think code should be refactored to reduce the sources of truth, starting with:
semExpr
should do as little analysis as possible, instead stick to dispatch based on node and context information- reduce the amount of redundant analysis and push more of the results into the AST and Context,
semProcAux
for example should flag the node in the return type position with that expectation and in turn that information should be used downstream - where redundant querying is required extract that into shared procs as today, but have caching controlled by the requester (this is already the case, thought it was worth emphasizing)
Impact
This will lead to a small set of top level semProcs covering early analysis similar to as today, but they'll fan out to a wider set of specialized sem procs and/or ones that will read more linearly with "flatter" branching logic. The AST and context will likely grow as more metatdata and pushdown analysis will live there.
Undoubtedly there will be some "fun" challenges that come up, presently the redundant analysis that's done allows callers of those procs to ignore certain lifetime concerns, such as a the fidelity of information available. This will likely lead to a number of annoying bugs as code shifts and the correct pre/post-conditions are re-established.
Long term better informal reasoning and reduced interdependency of procs should not only help existing compiler devs, but also reduce the barrier to entry for part-time compiler devs.