-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring sem code to reduce sources of truth #16
Comments
For those curious about the motivation I've been poking at sem for a over a month, even with the debugger informal reasoning remains a challenge. The information loss and continual reconstitution has meant that not only is it unclear where a problem should be solved but many solutions are partial. Additionally, at any time information loss occurs vast amounts of the language feature space needs to be considered, made more difficult with the various combinations of experimental flags. Anyway, I'd like to improve things not just for myself but for others presently working on the compiler and those yet to come. |
I don't agree too much with your analysis but it's worth a try. IMO information that is passed around via side channels (aka not part of PNode etc) just makes it harder and harder to specify what a "sem"ed Nim AST should be, which is important for the evolution of Nim's macro system. Where we can simplify and fix bugs is the conflation of type checking with symbol lookups. With |
Also, there is little reason why addr/deref computations needs to be done at all, the backends could do these. Or even if they are required why it's not done in sempass2 which is generally cleaner. |
Rereading what I originally wrote I didn't emphasize the AST being the preferred choice for this information, but even then I would revise it as I learn more of these things. I think much of what I'm describing is basically a crappy version of attribute grammars, I suppose I need to read some lecture notes or book on the subject and figure it out a little more. It does jive with keeping as much as possible in the AST, but it pushes a lot more information than is currently being pushed into the node. Of course, data in the node could be materialized or a computation on the data available in that node. |
I suspect everything I suggest here isn't particular new or ground breaking and might in fact be what's already intended as the direction. I know IC/Packed AST and revisiting code will naturally encourage parts of it. I'm merely writing it down as presently I don't see this type of refactor discussed.
I'll start by presenting the initial idea, if on principle it sounds like it's close, then start with very narrow changes to see what they actually look like. Then revise this as more is learned and plans change.
Initial Idea
I've read #5 but this is a broader change to the sem layer. I'll use
semexprs.semExpr
code's handlingof nkCall, nkInfix, nkPrefix, nkPostfix, nkCommand, nkCallStrLit:
as a motivating example. It's used in a number of areas and for surprising things, it's broken in subtle and difficult to debug ways.One such surprising thing is the change in nim-lang/Nim#16840 causes
js/t11166.nim
to fail as the experimental dot operator macro ends up recursing in a cycle trying to apply the dot operator to broken code. The dot operator return type is made up of undefined symbols, when it goes to resolve itsemTypeExpr
delegates tosemExpr
, which delegates to directly or transitively toqualifiedLookup
,semDirectOp
,builtInFieldAccess
, etc... each of these keep establishing some ground turth, then call a delegate throwing away much of that analysis. There is no source of truth and the same issue needs to be handled all over the place, but carefully as existing context is unclear.That pattern of some proc in sem learning a lot internally, and then calling another proc that either doesn't take into account (ommission bug) or regenerates that information (inconsistent/multiple source of truth) makes informal reasoning rather difficult.
Back to the example, semTypeNode is called by the
semProcAux
which knows that expression should provide a return type ultimately, but that gets lost especially by the time we hitsemExpr
and friends. Since they're all rather general it's now difficult to know if any changes are safe as one needs to reason about far more cases than before and it's unclear what will be impacted.Starting Guiding Principles
I think code should be refactored to reduce the sources of truth, starting with:
semExpr
should do as little analysis as possible, instead stick to dispatch based on node and context informationsemProcAux
for example should flag the node in the return type position with that expectation and in turn that information should be used downstreamImpact
This will lead to a small set of top level semProcs covering early analysis similar to as today, but they'll fan out to a wider set of specialized sem procs and/or ones that will read more linearly with "flatter" branching logic. The AST and context will likely grow as more metatdata and pushdown analysis will live there.
Undoubtedly there will be some "fun" challenges that come up, presently the redundant analysis that's done allows callers of those procs to ignore certain lifetime concerns, such as a the fidelity of information available. This will likely lead to a number of annoying bugs as code shifts and the correct pre/post-conditions are re-established.
Long term better informal reasoning and reduced interdependency of procs should not only help existing compiler devs, but also reduce the barrier to entry for part-time compiler devs.
The text was updated successfully, but these errors were encountered: