Investigate parallelisation of 3 optimisation phases (release-only) #28

safesparrow · 2022-11-20T01:45:46Z

Not type-checking related, but I'm doing a short investigation into the following:

In Release mode, there are 3 main optimisation phases/rounds.
Round 1 only uses results of round 1 for previous files, round 2 only uses results of round 2 for previous files etc.

This means that we can easily increase parallelisation.

safesparrow · 2022-11-20T01:47:37Z

I did a PoC, but it seems to break on FCS. I'm not sure why - there is either a dependency between phases I'm not aware of, or there is a bug in scheduling the items/propagating state.
Node complaining is file idx 7, phase 3:

safesparrow · 2022-11-20T02:21:18Z

Update: I had a few bugs, since fixed. FCS compilation now works 🎉

Very first snapshot:

Worth noting the ratio between time spent in each phase 1, 2, 3:

Timings before & after:

# Sequential:
Real: 32.8 Realdelta: 15.8 Cpu: 18.4 Cpudelta: 10.9 Mem: 1445 G0: 1245 G1: 242 G2:  1 [Optimizations]

# Parallel:
Real: 29.3 Realdelta: 12.5 Cpu: 13.2 Cpudelta:  4.2 Mem: 1641 G0: 1250 G1: 258 G2:  1 [Optimizations]

So less speedup than I'd have expected.

EDIT:
Actually I forgot to enable Server GC in the latest test.
Timings with Server GC:

# Sequential:
Real: 22.9 Realdelta: 12.6 Cpu: 44.5 Cpudelta: 12.8 Mem: 4054 G0:   7 G1:  3 G2:  1 [Optimizations]
# Parallel 1:
Real: 18.5 Realdelta:  8.0 Cpu: 45.8 Cpudelta: 13.7 Mem: 4423 G0:   4 G1:  2 G2:  1 [Optimizations]
# Parallel 2:
Real: 17.4 Realdelta:  7.3 Cpu: 47.8 Cpudelta: 15.6 Mem: 4355 G0:   4 G1:  2 G2:  1 [Optimizations]

safesparrow · 2022-11-20T19:42:46Z

Bonus quest:
The above has a limited potential for parallelisation.

What can speed it up further is a graph-based approach yet again.

For wider context, there are a few different ways optimisation of a file might behave. It might:

not depend on any other files (I think I already found proof that that's not the case, ie. cross-file inlines/other const values get evaluated)
depend only on the files which contain symbols/types it references,
depend on all the files above.
depend on all the files above and all previous phases (for all files in the project).

Case 4. can be ruled out because the code doesn't allow for this kind of dependency.

If:

is true => we could simply parallelise everything the way parsing is parallelised.
is true => we could use dependency tree generated by type-checking and optimize using the graph-based approach.
is true => we cannot do more than the above PoC already does.

For 2. the main issue would be making the optimization methods additive, ie. have the equivalent of AddResultsToTcState for optimization.

EDIT:
I had a quick look, and I'm cautiously optimistic that it will be fairly trivial to make file optimization return a delta that can then be combined with others.

safesparrow linked a pull request Nov 20, 2022 that will close this issue

Parallelisation of 3 optimization phases (affects release builds only) #29

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate parallelisation of 3 optimisation phases (release-only) #28

Investigate parallelisation of 3 optimisation phases (release-only) #28

safesparrow commented Nov 20, 2022

safesparrow commented Nov 20, 2022

safesparrow commented Nov 20, 2022 •

edited

Loading

safesparrow commented Nov 20, 2022 •

edited

Loading

Investigate parallelisation of 3 optimisation phases (release-only) #28

Investigate parallelisation of 3 optimisation phases (release-only) #28

Comments

safesparrow commented Nov 20, 2022

safesparrow commented Nov 20, 2022

safesparrow commented Nov 20, 2022 • edited Loading

safesparrow commented Nov 20, 2022 • edited Loading

safesparrow commented Nov 20, 2022 •

edited

Loading

safesparrow commented Nov 20, 2022 •

edited

Loading