Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate parallelisation of 3 optimisation phases (release-only) #28

Open
safesparrow opened this issue Nov 20, 2022 · 3 comments · May be fixed by #29
Open

Investigate parallelisation of 3 optimisation phases (release-only) #28

safesparrow opened this issue Nov 20, 2022 · 3 comments · May be fixed by #29

Comments

@safesparrow
Copy link
Owner

Not type-checking related, but I'm doing a short investigation into the following:

In Release mode, there are 3 main optimisation phases/rounds.
Round 1 only uses results of round 1 for previous files, round 2 only uses results of round 2 for previous files etc.

This means that we can easily increase parallelisation.

@safesparrow
Copy link
Owner Author

I did a PoC, but it seems to break on FCS. I'm not sure why - there is either a dependency between phases I'm not aware of, or there is a bug in scheduling the items/propagating state.
Node complaining is file idx 7, phase 3:

Image

Image

@safesparrow
Copy link
Owner Author

safesparrow commented Nov 20, 2022

Update: I had a few bugs, since fixed. FCS compilation now works 🎉

Very first snapshot:

Image

Worth noting the ratio between time spent in each phase 1, 2, 3:

Image

Timings before & after:

# Sequential:
Real: 32.8 Realdelta: 15.8 Cpu: 18.4 Cpudelta: 10.9 Mem: 1445 G0: 1245 G1: 242 G2:  1 [Optimizations]

# Parallel:
Real: 29.3 Realdelta: 12.5 Cpu: 13.2 Cpudelta:  4.2 Mem: 1641 G0: 1250 G1: 258 G2:  1 [Optimizations]

So less speedup than I'd have expected.

EDIT:
Actually I forgot to enable Server GC in the latest test.
Timings with Server GC:

# Sequential:
Real: 22.9 Realdelta: 12.6 Cpu: 44.5 Cpudelta: 12.8 Mem: 4054 G0:   7 G1:  3 G2:  1 [Optimizations]
# Parallel 1:
Real: 18.5 Realdelta:  8.0 Cpu: 45.8 Cpudelta: 13.7 Mem: 4423 G0:   4 G1:  2 G2:  1 [Optimizations]
# Parallel 2:
Real: 17.4 Realdelta:  7.3 Cpu: 47.8 Cpudelta: 15.6 Mem: 4355 G0:   4 G1:  2 G2:  1 [Optimizations]

@safesparrow
Copy link
Owner Author

safesparrow commented Nov 20, 2022

Bonus quest:
The above has a limited potential for parallelisation.

What can speed it up further is a graph-based approach yet again.

For wider context, there are a few different ways optimisation of a file might behave. It might:

  1. not depend on any other files (I think I already found proof that that's not the case, ie. cross-file inlines/other const values get evaluated)
  2. depend only on the files which contain symbols/types it references,
  3. depend on all the files above.
  4. depend on all the files above and all previous phases (for all files in the project).

Case 4. can be ruled out because the code doesn't allow for this kind of dependency.

If:

  1. is true => we could simply parallelise everything the way parsing is parallelised.
  2. is true => we could use dependency tree generated by type-checking and optimize using the graph-based approach.
  3. is true => we cannot do more than the above PoC already does.

For 2. the main issue would be making the optimization methods additive, ie. have the equivalent of AddResultsToTcState for optimization.

EDIT:
I had a quick look, and I'm cautiously optimistic that it will be fairly trivial to make file optimization return a delta that can then be combined with others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant