Missing reference in Transparent Compiler #1

nojaf · 2023-06-30T12:13:40Z

Transparent Compiler, a morning's play,
Parsing, checking, a code-fixing fray.
Editors ponder and wonder to type-check all files?
A conversation sparked, and implications riled.

Hi all,

I played around this morning with the 'Transparent Compiler' and I really like where this is going!

Praise aside, I found an interesting scenario I wish to discuss.
When checker.ParseAndCheckFileInProject is called, the Transparent Compiler will type-check the file and all the dependencies. This works out nicely but gets into a grey area when you start to edit the code where you are no longer in a valid state.

Please first read tests/FSharp.Compiler.Service.Tests/TransparentTests.fs!

My question is about how the different editors will deal with this kind of code fix:

As mentioned in the unit test, the editor might rely on the fact that all files are type-checked (as is the case in the present-day situation), while currently, in the Transparent Compiler (at the time of writing), this is not the case.

Some questions to ponder over:

When checker.ParseAndCheckFileInProject is called, should each file up to the requested file be type-checked or are we happy with the more performant-only type-check the dependent files?
If we only check the dependent files, should we expose the untyped information from all the other files?

Internally in DependencyResolution.mkGraph we construct a Trie with more or less the information we are after. type X in the unit test can easily be pinpointed to Project.A after we processed all the AST. We don't exactly have this today, but I can see it is quite feasible to capture.

Tagging a bunch of people: @auduchinok, @DedSec256, @0101, @psfinaki, @baronfel, @TheAngryByrd, @safesparrow, @dawedawe. Please tag others if I missed anyone.

Let us know what the implications would be right now for the editor you work on.

PS: There are no intentions to merge this PR. It only exists to kick off the conversation.

safesparrow · 2023-06-30T12:20:26Z

I do think that the IDE should still require all preceding files to be typechecked for most purposes. It should avoid lots of issues like the one described in this PR.

nojaf · 2023-06-30T12:28:51Z

It should avoid lots of issues like the one described in this PR.

You are most likely correct, although I would like to hear some specifics to move beyond gut feelings.

I want to keep an open mind as dotnet#15179 currently produces a green build. So I guess, either VS tests don't cover this, or it either just works for them by checking only the dependent files.

0101 · 2023-06-30T13:02:07Z

This is a great point. We should see if there are other issues like this.

If it was just this, then I can imagine even giving up on this feature if it makes everything faster for a particular project (maybe it would be configurable).

Also what might be a way is to deliver the typechecking results for current file as fast as we can and then start checking everything else in the background. So you could already work with a type checked file and the code fix would appear once available. Which I think we want to do anyway, so you can see diagnostics for the entire solution, which can change as you edit (though probably this needs to be configurable as well for if someone wants to save their laptop battery). And this way we wouldn't need to come up with any extra magic from untyped tree.

I want to keep an open mind as dotnet#15179 currently produces a green build. So I guess, either VS tests don't cover this, or it either just works for them by checking only the dependent files.

Most likely there aren't any tests for this. Or if there are they're not using Transparent Compiler - that PR doesn't replace current BackgroundCompiler with it, it just adds it as an opt-in feature. I'll need to go through the tests and plug it in where possible.

TheAngryByrd · 2023-06-30T13:13:58Z

I don't know how the different editors deal with this today, but there might be a chance that they rely on all the files being type-checked.

This is definitely the case in AdaptiveLSP of FSAC. When we call to typecheck a file, we rely on the caching in the compiler and calls to IFileSystem to get last write times and read in the file again and check if needed. We do have some tests but it's not really extensive.

0101 · 2023-06-30T13:45:36Z

CC @vzarytovskii @T-Gro

auduchinok · 2023-06-30T17:43:55Z

@nojaf Thanks for finding this case! It shows quite a serious/interesting issue for the tooling to think about.

It should avoid lots of issues like the one described in this PR.

You are most likely correct, although I would like to hear some specifics to move beyond gut feelings.

Yes, it's correct, features do indeed rely on type checking of the previous files. We can redesign some features or use heuristics, but the main issue may be editing code in general and unexpected slow downs in features.

Consider this code:

module Module

let f x = x

Typing Module.f in another file may trigger slow type checking for many intermediate files, even though type checking was fast for that file just before it. For example, we want to have all the tooltips and inferred types for Module and f in the code completion popup, and it would require checking the intermediate files.

From the top of my head, what we could try to do for the compiler service is something this: require the intermediates file type checking the first time, and allow using old results when changes wouldn't normally be seen by the analyzed file:

File1.fs
File2.fs // uses symbols from File1.fs
File3.fs // does not use symbols from File1.fs
File4.fs // uses symbols from File1.fs

First time File4.fs is analyzed, all previous files check results would be required. If there's a change to File1.fs, we can skip File3.fs analysis when analyzing File4.fs again and use its previous check results.

I'm not sure how feasible this is going to be, but it could be an interesting compromise that would in theory still allow skipping many files and would allow providing the same features quality.

Another approach we could try is adding a flag to ParseAndCheckFileInProject that would specify whether checking of the intermediate files is required. It would allow much faster highlighting, navigation, and some other features due to skipped files, and features like code completion and quick fixes could require checking the previously skipped files. This sounds both easier to do and probably more important for now.

auduchinok · 2023-06-30T17:52:24Z

I think there's another unrelated but important issue here: cross-project analysis.

One of the important scenarios I have in mind is editing a library code then jumping to a place that uses this library in another project (e.g. tests or app code). We would really want to skip the intermediate files here too, but (unless something is changed in this branch already) it would first check all the files in the library project, serializing the F# metadata along the way, then deserialize it all back when importing into the other project. Due to how the projects are isolated, we just can't use graph at all, even if me make it work for the editor features. What we could also try to do while replacing BackgroundCompiler in IDEs is try to skip the metadata step and allow referencing TAST nodes between projects directly.

safesparrow · 2023-06-30T22:11:13Z

First time File4.fs is analyzed, all previous files check results would be required. If there's a change to File1.fs, we can skip File3.fs analysis when analyzing File4.fs again and use its previous check results.
(...)
I'm not sure how feasible this is going to be, but it could be an interesting compromise that would in theory still allow skipping many files and would allow providing the same features quality.

It wouldn't provide the same features quality, right? It wouldn't work for code completion and quick fixes, as you explained later.

I do want to stress that "slow type checking" is not necessarily slow.
Assuming that:

the user has inifinite parallel processing power, and
the file's dependency tree isn't smaller than that of its predecessors,

graph checking of all the files should be as fast as just checking X's dependency graph.
I would argue that 2. is often the case, and 1. is true enough. Even as little as 4 processing cores give a maximum of 4x speedup (-75%) vs sequential type-checking, and graphs that allow more than 4 independent branches are rare.

Of course, avoiding unnecessary work is always better.
But I would be cautious about the strategy choice here:
Checking just the file's dependency graph might slightly reduce wall-clock time of type-checking that file,
but at the same time it will increase wall-clock time of type-checking anything that will require those extra results later on.

Somewhat-related: Personally I think the following IDE feature would be useful:
Given the active file A and open files B1...Bn, any time IDE performs a type-check of A after a code change,
it should trigger a type-check of all open files.
There should be a control that dictates how many threads are being used for this background type-checking, to let the user decide the right tradeoff between extra work for their machine and reduced wallclock times of multi-file changes.
I'm sure users with powerful CPUs will be happy to sacrifice a few cores and let FCS do more work.
The above feature would require better control over the prioritisation and parallel scheduling of multiple multithreaded analysis requests.

I do like @auduchinok 's suggestion to avoid unnecessary (de)serialization of TC information.
And I also like cross-project graph type-checking.
The latter would require building a file dependency graph of the entire solution. This might require some tweaks in the dependency evaluation logic, which currently has the luxury of assuming that all files live in the same project.
I will note though that this:

Due to how the projects are isolated, we just can't use graph at all, even if me make it work for the editor features
is slightly misleading. Using graph-based checking to check all the files in a project is faster than doing so sequentially - that is why graph-based checking in compilation is faster, despite still checking all the files.
The editor will still benefit from that speedup. But it could benefit more.

Essentially, the current project isolation can be thought of as adding virtual "project" nodes in the dependency graph.
Those nodes depend on all files in the project, and the first file in each project referencing another project depends on that "project" node. This creates bottlenecks that when removed, increase parallelism level. But that parallelism level is >1 to start with.

auduchinok · 2023-07-03T10:52:41Z

It wouldn't provide the same features quality, right? It wouldn't work for code completion and quick fixes, as you explained later.

For some features we only need faster symbols lookup in unopened files, and it'd be OK if intermediate files aren't checked. It seems it could make a significant difference in some cases.

T-Gro · 2023-07-03T13:00:49Z

@0101 :

How would it play with the "unopened documents" live analysis we want to have anyway?

What if the file in API argument and its direct dependencies would be type-checked as you have in your WIP PR right now, but the rest - instead of ignoring it - would be processed afterwards in a background operation if they got stale ?

The obvious cost is increased complexity for cancellation and scheduling between active and background requests.

0101 · 2023-07-03T13:32:18Z

What if the file in API argument and its direct dependencies would be type-checked as you have in your WIP PR right now, but the rest - instead of ignoring it - would be processed afterwards in a background operation if they got stale ?

The obvious cost is increased complexity for cancellation and scheduling between active and background requests.

That was my suggestion in a post above.

I don't think there would be much extra complexity if the caching system works properly. Maybe just detecting which checking requests should be canceled - but we will probably need to do that anyway if the IDEs don't handle it.

However, @safesparrow has a good point that this will deliver the results you might potentially need slower than forcing to check all files above the requested file, because much of it (if not all of it) could be done in parallel to the required dependencies.

Maybe we could have a more sophisticated graph processing algorithm with high priority nodes (requested file and its dependencies) and low priority nodes (other files above, maybe even below?), where low priority ones would be started as soon as they're unblocked, but we wouldn't wait for them to return the result. Not sure if it's feasible, or even worth it for how much better experience it would deliver. Would be nice to have more insight into common F# project dependency graphs.

safesparrow · 2023-07-04T22:51:58Z

Maybe we could have a more sophisticated graph processing algorithm with high priority nodes (requested file and its dependencies) and low priority nodes (other files above, maybe even below?), where low priority ones would be started as soon as they're unblocked, but we wouldn't wait for them to return the result. Not sure if it's feasible, or even worth it for how much better experience it would deliver. Would be nice to have more insight into common F# project dependency graphs.

I think this is very desirable. I often wish FCS would at the same time:

do more analysis in the background, in case I need it,
and not block the work I need due to any other things I don't necessarily need [anymore]

AFAIK there is no system currently that allows for this. Such a system should allow the user to dedicate a certain amount of resources (no. of threads) to certain tasks (main/background).
Work items used for scheduling would need to be fine-grained - each file's type-checking/parsing should be a standalone item, which can have multiple user requests associated with it - I think this is key to make the whole system straightforward and flexible.

Once such a system is in place, the perfect solution would be as you describe - type-check file dependencies immediately, and schedule type-checking of remaining files in the background.

auduchinok · 2023-07-11T12:50:25Z

Another thing to consider when talking about cross-project references is dotnet#15180. If we are able to work with TAST symbols directly, then there's another possible way to fix this issue. Which would be good, as it breaks the cross-project analysis considerably.

Proof of concept UnitTest

d9c4052

0101 force-pushed the experiments branch from faf9f4d to 7612843 Compare September 4, 2023 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing reference in Transparent Compiler #1

Missing reference in Transparent Compiler #1

nojaf commented Jun 30, 2023

safesparrow commented Jun 30, 2023

nojaf commented Jun 30, 2023

0101 commented Jun 30, 2023

TheAngryByrd commented Jun 30, 2023

0101 commented Jun 30, 2023

auduchinok commented Jun 30, 2023 •

edited

Loading

auduchinok commented Jun 30, 2023

safesparrow commented Jun 30, 2023 •

edited

Loading

auduchinok commented Jul 3, 2023

T-Gro commented Jul 3, 2023

0101 commented Jul 3, 2023

safesparrow commented Jul 4, 2023 •

edited

Loading

auduchinok commented Jul 11, 2023

Missing reference in Transparent Compiler #1

Are you sure you want to change the base?

Missing reference in Transparent Compiler #1

Conversation

nojaf commented Jun 30, 2023

safesparrow commented Jun 30, 2023

nojaf commented Jun 30, 2023

0101 commented Jun 30, 2023

TheAngryByrd commented Jun 30, 2023

0101 commented Jun 30, 2023

auduchinok commented Jun 30, 2023 • edited Loading

auduchinok commented Jun 30, 2023

safesparrow commented Jun 30, 2023 • edited Loading

auduchinok commented Jul 3, 2023

T-Gro commented Jul 3, 2023

0101 commented Jul 3, 2023

safesparrow commented Jul 4, 2023 • edited Loading

auduchinok commented Jul 11, 2023

auduchinok commented Jun 30, 2023 •

edited

Loading

safesparrow commented Jun 30, 2023 •

edited

Loading

safesparrow commented Jul 4, 2023 •

edited

Loading