Skip to content

Separate syntactic and semantic nodes? #5159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ilevkivskyi opened this issue Jun 6, 2018 · 2 comments
Open

Separate syntactic and semantic nodes? #5159

ilevkivskyi opened this issue Jun 6, 2018 · 2 comments

Comments

@ilevkivskyi
Copy link
Member

There is something that I was thinking about for a year or so that became more apparent while working on type alias refactoring (and planning the module refactoring). We have two different things mixed together in mypy code: syntactic nodes and "semantic" nodes. This is most obvious in fixup.py where we use NodeVisitor interface (intended to traverse syntactic tree) to patch symbol tables (something that carries purely semantic info). As a result, there are various weird things like visit_var, visit_type_info, etc. The idea is to have clear separation between syntactic and semantic nodes and have separate visitors for them. We currently already have this partially, we have TypeInfo vs ClassDef, and Var vs AssignmentStmt.

Here is a short (and approximate) summary of the proposal:

  • There are following semantic nodes: Var, Function, Class (non-leaf, has symbol table), Module (non-leaf, ditto), TypeVar, TypeAlias, ConditionalNode (see below).
  • The above nodes inherit from SymbolNode, while syntactic nodes inherit from Node, these two both inherit from Context.
  • Only the above nodes (and types) are serialized in cache and deserialized in incremental runs.
  • Semantic nodes can have attributes that point to the relevant (defining) syntactic nodes (for example variable can have defining assignment, or function statement if property), but we should limit this to minimize cache size.
  • The above nodes will have their separate visitor, so that in total we have three: for AST, for symbol tables, and for types.
  • ConditionalNode exists to avoid having .nodes instead of .node in SymbolTableNodes (the latter are thin wrappers around semantic nodes), while still supporting certain conditional definitions like conditional imports. This node will have an attribute that is a list of other semantic nodes.

This is a very large refactoring, but IMO this will add robustness and clarity, and will simplify addition of new features (e.g. conditional imports). This is probably a low priority (long term) thing. We just discussed this with @JukkaL, he likes the idea but is concerned about the size of this refactoring. A possible way to go forward with this is to split this in several separate PRs.

@elazarg
Copy link
Contributor

elazarg commented Jun 11, 2018

I tried to do this several times without success - the changes were unmanageable for me. I really hope this will be done.

@ilevkivskyi
Copy link
Member Author

Some ideas to split this into smaller parts:

  • Kill Decorator, it isn't present in native Python AST, and currently this node appears in both syntactic and semantic trees (intersection that we want to get rid of).
  • Avoid serialization of syntactic nodes, that have matching semantic ones, currently TypeInfo.defn (which is ClassDef) is also serialized, but only partially. Removing this (we can keep context info for better errors, like TypeInfo.defn_context: Context) will not only disentangle semantic/syntactic parts, but might likely make serialization less error-prone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants