beginning incremental compilation for casual review #14009

disruptek · 2020-04-17T21:34:01Z

No description provided.

cooldome · 2020-04-18T09:54:18Z

Please don't combine renaming and refactoring and new functionality in one PR.
It makes review process a pain and PR review will likely face delays and unnecessary discussions.

Keep these two separate please.
+20 on incremental compilation

disruptek · 2020-04-18T13:41:00Z

I'm not sure on what you're referring to, but, of course you are correct. It just wasn't feasible here. The review is going to be ridiculous; we talked about it and there's no point in splitting hairs.

disruptek · 2020-04-18T14:14:31Z

Sorry, I should have added that this will likely get split into multiple PRs by hand as I go through and minimize it, etc. It has just become unwieldy to keep branching on every little rethink; hence the branch name -- ic2. 😁

compiler/ast.nim

compiler/astalgo.nim

compiler/ccgexprs.nim

compiler/ast.nim

timotheecour · 2020-04-25T00:22:22Z

introduction of setRope creates a massive diff by itself (and makes reviewing or rebasing other PRs hard), why not instead add:
proc `r=`*(a: var TLoc; roap: Rope) {.tags: [TreeWrite].} =...
which avoids changing all foo.r = expr

compiler/cgen.nim

compiler/ic/audit.nim

compiler/nim.cfg

compiler/passes.nim

testes

timotheecour · 2020-04-25T01:04:45Z

the more you can isolate (while keeping code DRY) inside compiler/ic/ the better, right now there's a massive surface area with existing code that makes reviewing hard, increases chances of regressions and rebase conflicts for your PR (or other PR if they rebase to yours); for example instead of inlining IC specific code inside existing procs, you could instead refactor to call a helper proc inside compiler/ic/, thus reducing surface with existing code.
I feel like a compiler server approach is an alternative approach worth exploring for incremental compilation. Better performance, much easier (and shorter) implementation, no need for serialization/deserialization, everything is in memory.

disruptek · 2020-04-25T01:53:45Z

You're not wrong; that's why I moved everything into ic yesterday.

A lot of the surface will be reduced as more of the AST mutation is eliminated; the goal is for the frontend to be immutable. So, all the location stuff will likely go away. I wanna aim for like 3-5 calls in the API, but that's probably my naivety talking.

The plan is to make an intermediate AST format that is used specifically by the backend. It will basically constitute a format trivially translated to backend modules; it can be geared to a much simpler set of semantics. We will rehydrate the frontend AST for macro passes (if necessary) but otherwise leave it cached. The backend AST could be cached, too, but it seems unlikely that it will ever be expensive to evaluate, and less complexity equals fewer bugs.

I think a compiler server is pretty easy once we get the AST locked down, but it's more likely that we'll want less in memory and not more. As the abstractions grow, we want to be planning for less memory consumption so we continue to work on embedded platforms.

I'm interested in pursuing a couple avenues; the easiest is probably to have new objects enter the environment via the cache, but a proper network repl would be cool, too.

First, though, I have to make it actually compile stuff. I'm still learning how it fits together, and I'm slow. Then, I have to make it all much simpler.

Thanks for taking a look.

timotheecour · 2020-04-25T03:24:28Z

> The plan is to make an intermediate AST format that is used specifically by the backend

So basically introducing an IR after semantic phase. This could be a good idea; it'd simplify backend code (js, c, cpp, objc), ease integration with https://github.com/arnetheduck/nlvm, reduce bugs where abstract Ast nodes wrongly end up in cgen (causing codegen errors), allow a future wasm backend without emscripten etc. It'd be a simplified AST. Maybe a cheap way to achieve similar goals could be to define TNodeBackendKind, a subset of TNodeKind.

That being said, I'm not sure this would help with IC, which needs to handle non-backend AST.

> I think a compiler server is pretty easy once we get the AST locked down, but it's more likely that we'll want less in memory and not more. As the abstractions grow, we want to be planning for less memory consumption so we continue to work on embedded platforms.

I'm not sure I buy that. My use case for IC is on a development machine (lots of RAM although I doubt it should consume much more than what nim compiler already does; or later, even hosted on cloud). It can still target embedded platforms, that's orthogonal (eg like developping for ios from a mac).

I'm actually super interested in the "compiler server" approach, and this could potentially drive down compile times (for partial recompilations) to interactive speed, enable true REPL and such. But I don't want to derail your PR on this topic.

disruptek · 2020-04-25T03:42:51Z

I'll let @Araq comment on his plans for a new IR, but I understand they are orthogonal to this effort. I have a feeling that he wants more of the super-efficient packed AST we talked about -- that could model more of the frontend. Maybe we'd continue to use the backend AST in his new design; it'll be a good place to experiment with some very simple "raw memory" exploits.

Lots of memory is great, but we already hit limits with the compiler, so it's no longer debatable. I think memory mapping the cache is a simple middle ground that should lift all boats. The new AST will give us an opportunity to decouple more of the backend code and perform compilation in smaller chunks, so I'm hopeful that it's win/win there, too.

Early days -- everyone's an optimist!

Araq · 2020-04-26T13:01:36Z

I'm actually super interested in the "compiler server" approach, and this could potentially drive down compile times (for partial recompilations) to interactive speed, enable true REPL and such. But I don't want to derail your PR on this topic.

That's effectively what I already tried with nimsuggest and nimsuggest can leak memory in a way we have never been able to track down. The current mutability soup is not sustainable in the long run and we need to experiment with alternative IRs, I see no way around it. The existing AST / compiler architecture is over ten years old and I think I learned some things in the meantime.

Clyybber · 2020-05-17T22:30:16Z

Continues in a different branch now :)

Araq · 2020-05-18T08:13:26Z

Ping @disruptek, is this true?

disruptek · 2020-05-19T14:28:48Z

Yeah, I squashed and rebased recently.

timotheecour · 2020-05-21T03:58:42Z

@disruptek well you can simply git push --force which avoids having to open a new PR and loose all the comments

disruptek force-pushed the ic2 branch 2 times, most recently from c950a2c to d7c2b17 Compare April 18, 2020 02:29