-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple parser from PNode with DOD approach #423
Labels
refactor
Implementation refactor
Comments
@haxscramper took a bit but to a first crack at it. Wrote it quickly on my phone during lunch so hopefully it's not too bad. |
I think "Decouple the parser from PNode", "Improve parser data types" and follow-ups can be implemented as three separate PRs - the first one is rather easy, so I can start working on it right away, and then we can look closer at the DOD rewrite. |
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 2, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous author of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. -
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 2, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. -
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 3, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234`
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 3, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234` - Implement `astrepr.nim` support for the `ParsedNode` - `debug` and `treeRepr` procedures. Further work - compiler/ast/parser.nim:744 :: introduce two tokens in order to handle custom literals. There is no real need to mash together everything in a single chunk of text that would have to be split apart down the line.
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 3, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234` - Implement `astrepr.nim` support for the `ParsedNode` - `debug` and `treeRepr` procedures. Further work - compiler/ast/parser.nim:744 :: introduce two tokens in order to handle custom literals. There is no real need to mash together everything in a single chunk of text that would have to be split apart down the line.
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 3, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234` - Implement `astrepr.nim` support for the `ParsedNode` - `debug` and `treeRepr` procedures. - Allow skipping repeated symbol in the `(open|closed)SymChoice` node kinds. Further work - compiler/ast/parser.nim:744 :: introduce two tokens in order to handle custom literals. There is no real need to mash together everything in a single chunk of text that would have to be split apart down the line.
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 3, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234` - Implement `astrepr.nim` support for the `ParsedNode` - `debug` and `treeRepr` procedures. - Allow skipping repeated symbol in the `(open|closed)SymChoice` node kinds. - compiler/front/options.nim:693 :: Unconditionally output debugging traces if they are requested, regardless of the surrounding hooks and filters. Introduce the `bypassWriteHookForTrace` flag in the debugging hack controller which makes it possible to bypass the `writeln` hook. Further work - compiler/ast/parser.nim:744 :: introduce two tokens in order to handle custom literals. There is no real need to mash together everything in a single chunk of text that would have to be split apart down the line.
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 3, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234` - Implement `astrepr.nim` support for the `ParsedNode` - `debug` and `treeRepr` procedures. - Allow skipping repeated symbol in the `(open|closed)SymChoice` node kinds in the `astrepr` - Restructure imports of the `astepr` and move it closer to the 'primitive' modules - type definitions and trivial data queries. The most important change is removal of the `ast.nim` and `renderer.nim` imports, which opens these modules for debugging as well. - Consider possibility of a nil `owner` in the symbol owner chain representation calculations in `astrepr` - compiler/front/options.nim:693 :: Unconditionally output debugging traces if they are requested, regardless of the surrounding hooks and filters. Introduce the `bypassWriteHookForTrace` flag in the debugging hack controller which makes it possible to bypass the `writeln` hook. Further work - compiler/ast/parser.nim:744 :: introduce two tokens in order to handle custom literals. There is no real need to mash together everything in a single chunk of text that would have to be split apart down the line.
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 3, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234` - Implement `astrepr.nim` support for the `ParsedNode` - `debug` and `treeRepr` procedures. - Allow skipping repeated symbol in the `(open|closed)SymChoice` node kinds in the `astrepr` - Restructure imports of the `astepr` and move it closer to the 'primitive' modules - type definitions and trivial data queries. The most important change is removal of the `ast.nim` and `renderer.nim` imports, which opens these modules for debugging as well. - Consider possibility of a nil `owner` in the symbol owner chain representation calculations in `astrepr` - Semantic tracer debug output file rotation now uses location of the first `.define(` call as a file name base instead of integer-based ones. Added basic logging information about created files - now a developer can see what is going on and what gets written. For example, running with `--define=nimCompilerDebugTraceDir=/tmp` and seveal `define(...)` sections produces the following output: ``` comparisons.nim(269, 8): opening /tmp/comparisons_nim_0 trace comparisons.nim(274, 7): closing trace, wrote 44 records comparisons.nim(276, 8): opening /tmp/comparisons_nim_1 trace comparisons.nim(285, 7): closing trace, wrote 329 records ``` - compiler/front/options.nim:693 :: Unconditionally output debugging traces if they are requested, regardless of the surrounding hooks and filters. Introduce the `bypassWriteHookForTrace` flag in the debugging hack controller which makes it possible to bypass the `writeln` hook. Further work - compiler/ast/parser.nim:744 :: introduce two tokens in order to handle custom literals. There is no real need to mash together everything in a single chunk of text that would have to be split apart down the line.
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 3, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 * Documentation changes - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b * Tangentially related refactoring work - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234` * Debugging tools improvements - Implement `astrepr.nim` support for the `ParsedNode` and `PIdent` - `debug` and `treeRepr` procedures. - Allow skipping repeated symbol in the `(open|closed)SymChoice` node kinds in the `astrepr` - Restructure imports of the `astepr` and move it closer to the 'primitive' modules - type definitions and trivial data queries. The most important change is removal of the `ast.nim` and `renderer.nim` imports, which opens these modules for debugging as well. - Consider possibility of a nil `owner` in the symbol owner chain representation calculations in `astrepr` - Semantic tracer debug output file rotation now uses location of the first `.define(` call as a file name base instead of integer-based ones. Added basic logging information about created files - now a developer can see what is going on and what gets written. For example, running with `--define=nimCompilerDebugTraceDir=/tmp` and seveal `define(...)` sections produces the following output: ``` comparisons.nim(269, 8): opening /tmp/comparisons_nim_0 trace comparisons.nim(274, 7): closing trace, wrote 44 records comparisons.nim(276, 8): opening /tmp/comparisons_nim_1 trace comparisons.nim(285, 7): closing trace, wrote 329 records ``` - Simplify implementation of the `reportInst` handling in the debug utils tracer - now each toplevel tracer template must submit the location by itself - this solution avoids unintuitive and fragile `instLoc(-5)` call which might break with more templates introduced. Also updated documentation on the `reportInst` and `reportFrom` in the reports file. - compiler/front/options.nim:693 :: Unconditionally output debugging traces if they are requested, regardless of the surrounding hooks and filters. Introduce the `bypassWriteHookForTrace` flag in the debugging hack controller which makes it possible to bypass the `writeln` hook. * Further work - compiler/ast/parser.nim:744 :: introduce two tokens in order to handle custom literals. There is no real need to mash together everything in a single chunk of text that would have to be split apart down the line.
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 3, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 * Documentation changes - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b * Tangentially related refactoring work - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234` * Debugging tools improvements - Implement `astrepr.nim` support for the `ParsedNode` and `PIdent` - `debug` and `treeRepr` procedures. - Allow skipping repeated symbol in the `(open|closed)SymChoice` node kinds in the `astrepr` - Restructure imports of the `astepr` and move it closer to the 'primitive' modules - type definitions and trivial data queries. The most important change is removal of the `ast.nim` and `renderer.nim` imports, which opens these modules for debugging as well. - Consider possibility of a nil `owner` in the symbol owner chain representation calculations in `astrepr` - Semantic tracer debug output file rotation now uses location of the first `.define(` call as a file name base instead of integer-based ones. Added basic logging information about created files - now a developer can see what is going on and what gets written. For example, running with `--define=nimCompilerDebugTraceDir=/tmp` and seveal `define(...)` sections produces the following output: ``` comparisons.nim(269, 8): opening /tmp/comparisons_nim_0 trace comparisons.nim(274, 7): closing trace, wrote 44 records comparisons.nim(276, 8): opening /tmp/comparisons_nim_1 trace comparisons.nim(285, 7): closing trace, wrote 329 records ``` - Simplify implementation of the `reportInst` handling in the debug utils tracer - now each toplevel tracer template must submit the location by itself - this solution avoids unintuitive and fragile `instLoc(-5)` call which might break with more templates introduced. Also updated documentation on the `reportInst` and `reportFrom` in the reports file. - compiler/front/options.nim:693 :: Unconditionally output debugging traces if they are requested, regardless of the surrounding hooks and filters. Introduce the `bypassWriteHookForTrace` flag in the debugging hack controller which makes it possible to bypass the `writeln` hook. * Further work - compiler/ast/parser.nim:744 :: introduce two tokens in order to handle custom literals. There is no real need to mash together everything in a single chunk of text that would have to be split apart down the line.
haxscramper
added a commit
to haxscramper/nimskull
that referenced
this issue
Sep 3, 2022
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 * Documentation changes - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b * Tangentially related refactoring work - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234` * Debugging tools improvements - Implement `astrepr.nim` support for the `ParsedNode` and `PIdent` - `debug` and `treeRepr` procedures. - Allow skipping repeated symbol in the `(open|closed)SymChoice` node kinds in the `astrepr` - Restructure imports of the `astepr` and move it closer to the 'primitive' modules - type definitions and trivial data queries. The most important change is removal of the `ast.nim` and `renderer.nim` imports, which opens these modules for debugging as well. - Consider possibility of a nil `owner` in the symbol owner chain representation calculations in `astrepr` - Semantic tracer debug output file rotation now uses location of the first `.define(` call as a file name base instead of integer-based ones. Added basic logging information about created files - now a developer can see what is going on and what gets written. For example, running with `--define=nimCompilerDebugTraceDir=/tmp` and seveal `define(...)` sections produces the following output: ``` comparisons.nim(269, 8): opening /tmp/comparisons_nim_0 trace comparisons.nim(274, 7): closing trace, wrote 44 records comparisons.nim(276, 8): opening /tmp/comparisons_nim_1 trace comparisons.nim(285, 7): closing trace, wrote 329 records ``` - Simplify implementation of the `reportInst` handling in the debug utils tracer - now each toplevel tracer template must submit the location by itself - this solution avoids unintuitive and fragile `instLoc(-5)` call which might break with more templates introduced. Also updated documentation on the `reportInst` and `reportFrom` in the reports file. - compiler/front/options.nim:693 :: Unconditionally output debugging traces if they are requested, regardless of the surrounding hooks and filters. Introduce the `bypassWriteHookForTrace` flag in the debugging hack controller which makes it possible to bypass the `writeln` hook. * Further work - compiler/ast/parser.nim:744 :: introduce two tokens in order to handle custom literals. There is no real need to mash together everything in a single chunk of text that would have to be split apart down the line.
todo items from the PR
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Upgraded parser (and possibly lexer/tokenizer) that conforms to a more data oriented design approach as taken in the VM and new Backend (in progress). The semantic analysis layer won't be able to take advantage of it yet, but other tools like filters and pretty might.
The approach at the out set:
PNode
NB: like any plan revise and improve the approach as we learn more. At the end we should have a faster parser which is nicer to memory, with more tests/validation against the grammar, fewer oddities in the grammar, clearer node kinds, decoupling from semantic analysis, and a path for tools like source filters, formatters, etc, easily making effective use of the new parser.
Deets
Decouple the parser from
PNode
The parser is coupled to
PNode
right now which takes a sea of nodes on the heap approach. First step make aParsedNode
(or whatever) that's basically a very slimPNode
, add a proc to convert toPNode
, and get everything working as before.Improve parser data types
Start revising the parser produced data types with a more DOD style. Start with a
ParsedFragment
that is a bit more than a sequence of parsed nodes (minimal object) and some metadata. Internally it should be a DOD layout and be produced byparseAll
,parseTopLevelStmt
, etc. A fragment could be some arbitrary string of AST or a while file/module, statement, etc. The idea is this is the parser's production for a single parse action. If a whole file vs fragment distinction proves to be useful then do that.First cut data type will be easy and then getting the parser code to use it will be where it gets interesting focus on getting it working and keeping the data model as clean as possible. Mark all the weirdness so we know what to come back to next.
Smooth out rough edges
The above will get it working, now we clean things up so we're learning from the design smells showing up in the data model.
Final remarks on approach
This doesn't have to be a big bang. Quick wins regardless of this work should be PR'd separately ahead of time. If the first part of the approach doesn't hurt performance too much then that can come in on its own. If the second is clean enough then ditto. Other intermediate milestones might be discovered and get pulled in early.
todo steps added by @haxscramper, collected from the discussion in #425
FileIndex -> seq[Token]
mapping that is generated by the lexer. Each token includes a line, a column and a tag information. And an extent range.ParsedNode
includes(FileIndex, TokenIndex)
pair that is used to get the token back from the main storageTNodeKind
, but without sem-related data - whileParsedNode
andPNode
are structurally similar, the latter one has a lot more states.ParsedNode
to use a DOD structure as wellThe text was updated successfully, but these errors were encountered: