From 6b0b77189dff97bdd2987da099223e9d8b530ee6 Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Thu, 24 Sep 2015 13:01:04 +1200 Subject: [PATCH 1/6] Changes to the compiler to support IDEs --- text/0000-ide.md | 530 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 530 insertions(+) create mode 100644 text/0000-ide.md diff --git a/text/0000-ide.md b/text/0000-ide.md new file mode 100644 index 00000000000..8a0ab3eaf92 --- /dev/null +++ b/text/0000-ide.md @@ -0,0 +1,530 @@ +- Feature Name: n/a +- Start Date: 2015-10-13 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + +This RFC describes how we intend to modify the compiler to support IDEs. The +intention is that support will be as generic as possible. A follow-up internals +post will describe how we intend to focus our energies and deploy Rust support +in actual IDEs. + +There are two sets of technical changes proposed in this RFC: changes to how we +compile, and the creation of an 'oracle' tool (name of tool TBC). + +This RFC is fairly detailed, it is intended as a straw-man plan to guide early +implementation, rather than as a strict blueprint. + + +## Compilation model + +An IDE will perform two kinds of compilation - an incremental check as the user +types (used to provide error and code completion information) and a full build. +The full build is explicitly signaled by the user (it could also happen +implicitly, for example when the user saves a file). A full build is basically +just a `cargo build` command, as would be done from the command line. It will +take advantage of any future improvements to regular compilation (such as +incremental compilation), but there is essentially no change from a compile +today. It is not very interesting and won't be discussed further. + +The incremental check follows a new model of compilation. This check must be as +fast as possible but does not need to generate machine code. We'll describe it +in more detail below. We call this kind of compilation a 'quick-check'. + +This RFC also covers making compilation more robust. + + +## The oracle + +The oracle is a long running daemon process. It will keep a database +representation of an entire project's source code and semantic information (as +opposed to the compiler which operates on a crate at a time). It is +incrementally updated by the compiler and provides an IPC API for providing +information about a program - the low-level information an IDE (or similar tool) +needs, e.g., code completion options, location of definitions/declarations, +documentation for items. + +The oracle is a general purpose, low-level tool and should be usable by any IDE +as well as other tools. End users and editors with less project knowledge should +use the oracle via a more friendly interface (such as Racer). + + +## Other shared functionality + +Other functionality, such as refactoring and reformatting will be provided by +separate tools rather than the oracle. These should be sharable between IDE +implementations. They are not covered in this RFC. + + +# Motivation + +An IDE collects together many tools into a single piece of software. Some of +these are entirely separate from the rest of the Rust eco-system (such as editor +functionality), some will reuse existing tools in pretty much the same way they +are already used (e.g., formatting code, which should straightforwardly use +Rustfmt), and some will have totally new ways of using the compiler or other +tools (e.g., code completion). + +Modern IDEs are large and complex pieces of software; creating a new one from +scratch for Rust would be impractical. Therefore we need to work with existing +IDEs (such as Eclipse, IntelliJ, and Visual Studio) to provide functionality. +These IDEs provide excellent editor and project management support out of the +box, but know nothing about the Rust language. + +An important aspect of IDE support is that response times must be extremely +short. Users expect information as they type. Running normal compilation of an +entire project is far too slow. Furthermore, as the user is typing, the program +will not be a valid, complete Rust program. + +We expect that an IDE may have its own lexer and parser. This is necessary for +the IDE to quickly give parse errors as the user types. Editors are free to rely +on the compiler's parsing if they prefer (the compiler will do its own parsing +in any case). Further information (name resolution, type information, etc.) will +be provided by the compiler via the oracle. + + +# Detailed design + +## Quick-check compilation + +(See also open questions, below). + +We run the quick-check compiler on a single crate. At some point after quick +checking, dependent crates must be rebuilt. This is the responsibility of an +external tool to manage (see below). Quick-check is driven by an IDE (or +possibly by the oracle), not by Cargo. + + +### Incremental and lazy compilation + +Incremental compilation is where, rather than re-compiling an entire crate, only +code which is changed and its dependencies are re-compiled. See +[RFC #1298](https://github.com/rust-lang/rfcs/pull/1298). + +Lazy compilation is where, rather than compiling an entire crate, we start by +compiling a single function (or possibly some other unit of code), and re- +compiling code which is depended on until we are done. Not all of a crate will +be compiled in this fashion. + +These two compilation strategies are faster than the current compilation model +(compile everything, every time). They are somewhat orthogonal - compilation can +be either lazy or incremental without implying the other. The [current +proposal](https://github.com/rust-lang/rfcs/pull/1298) for supporting +incremental compilation involves some lazy compilation as an implementation +detail. + +For quick-checking, compilation should be both incremental and lazy. The input +to the compiler is not just the crate being re-compiled, but also the span of +code changed (normal incremental compilation computes this span for itself, but +the IDE already has this information, so it would be wasteful to recompute it). +As a further optimisation, if the IDE can refer to items by an id (such as a +path), then this could be fed to the compiler rather than a code span to save +the compiler the effort of finding an AST node from a code span. + +We begin by computing which code is invalidated by the change (that is, any code +which depends on the changed code). We then re-compile the changed code. +Information which is depended upon is looked up in the saved metadata used for +incremental compilation. When we have re-compiled the changed code, then we +output the result (see below). If there are no fatal errors, then we continue to +compile the rest of the invalidated code. + + +### Compilation output + +The output of compilation is either success or a set of errors (as with today's +compiler, but see below for more detail on error message format). However, since +compilation can continue after returning an initial result, we might produce +further errors (I presume that IDEs provide a mechanism for the compiler to +communicate these asynchronously to the IDE plugin). + +In addition we must produce data to update the oracle, this should be done +directly, without involving the IDE plugin. + +Quick-check does not generate executable code or crate metadata. However, it +should (probably) update the metadata used for incremental compilation. + + +### Multiple crates + +Quick check only applies to a single crate, however, after some changes we might +need to re-compile dependent crates. This is the IDE's responsibility. In the +short term we can just trigger a full re-build (via Cargo) when the user starts +editing a file belonging to a different crate (there will obviously be some lag +there). The compiler must also generate crate metadata for the modified crate. + +Long term, the IDE might keep track of the dependency graph between crates +(provided by Cargo). The quick-check should signal when a crate's public +interface changes due to re-compilation. In that case the IDE can trigger +background re-compilation of dependent crates (possibly with some +delay/batching). + + +## The Oracle + +The oracle is a long-running tool which takes input from both full builds and +quick-checks, and responds to queries about a Rust program. Of particular note +is that it knows about a whole project, not just a single crate. In fact, other +than as a kind of module, it doesn't much care about the notion of a crate at +all. + +We require a data format for getting metadata from the compiler to the oracle. +Unfortunately none of the existing ones are quite right. Crate metadata is not +complete enough (it mostly only contains data about interfaces, not function +bodies), save-analysis data has been processed too far (basically into strings) +which loses some of the structure that would be useful, debuginfo is not Rust- +centric enough (i.e., does not contain Rust type information) and is based on +expanded source code. Furthermore, serialising any of the compiler's IRs is not +good enough: the AST and HIR do not contain any type or name information, the +HIR and MIR are post-expansion. + +The best option seems to be the save-analysis information. This is in a poor +format, but is the 'right' data (it can be based on an early AST and includes +type and name information). It can be evolved to be more efficient form over the +long run (it has been a low priority task for a while to support different +formats for the information). + +Full builds will generate a dump of save-analysis data for a whole crate. Quick +checks will generate data for the changed code. In both cases the oracle must +incrementally update its knowledge of the source code. How exactly to do this +when neither names nor ids are stable is an interesting question, but too much +detail for this RFC (especially as the implementation of ids in the compiler is +evolving). + +For crates which are not built from source (for example the standard library), +authors can choose to distribute the oracle's metadata to allow users to get a +good IDE experience with these crates. In this case, we only need metadata for +interfaces, not the bodies of functions or private items. The oracle should +handle such reduced metadata. It should be possible to generate the oracle's +metadata from the crate metadata, but this is not a short-term goal. (Note this +will require some knowledge in the IDE too - if there is no corresponding source +code, the IDE cannot 'jump to definition', for example). + +The oracle's data is platform-dependent. We must be careful when working with a +cross-compiled project to generate metadata for the target machine. This +shouldn't be a problem for normal compilation, but it means that quick-check +compilation must be configured for the same target, and care should be taken +with downloaded metadata. + +As well as metadata based on types and names, the oracle should keep track of +warnings. Since code with warnings but no errors is not re-compiled, a tool +outside the compiler must track them for display in the IDE. This will be done +by the oracle. + + +### Details + +#### API + +The oracle's API is a set of IPC calls. How exactly these should be implemented +is not clear. The most promising options are sending JSON over TCP, using +[thrift](https://thrift.apache.org/), or using Cap'n Proto (I'm unclear about +exactly what the transport layer looks like using Cap'n Proto, there is no Cap'n +Proto RCP implementation for Java, but I believe there is an alternative using +shared, memory mapped files as a buffer; I'm not familiar enough with the +library to work out what is needed). + +I've detailed the API I believe we'll need to start with. This is slightly more +than a minimal set. I expect it will expand as time goes by. At some point we +will want to stabilise parts of the API to allow for third party implementations +of the oracle and compiler. + +All API calls can return success or error results. Many calls involve a *span*; +for the oracle's API, this is defined as two byte offsets from the start of the +file (oracle spans must always be contained in a single file). + +There are some alternative span definitions: we could use file and column indices +rather than byte offsets (this has some edge case difficulties with the +definition of a newline - do unicode newlines count? It also requires some extra +computation), we could use character offsets (again involves some more +computation, but might be more robust). + +A problem is that Visual Studio uses UTF16 while Rust uses UTF8, there is (I +understand) no efficient way to convert between byte counts in these systems. +I'm not sure how to address this. It might require the oracle to be able to +operate in UTF16 mode. + +Where no return value is specified, the call returns success or failure (with a +reason). + +The philosophy of the API is that most functions should only take a single call, +as opposed to making each function as minimal and orthogonal as possible. This +is because IPC can be slow and response time is important for IDEs. + + +**Projects** + +Note that the oracle stores no metadata about a project. + +*init project* + +Takes a project name, returns an id string (something close to the project's name). + +*delete project* + +Takes a project id. + +*list projects* + +Takes nothing, returns a list of project ids. + +Each of the remaining calls takes a project identifier. + + +**Update** + +See section on input data format below. + +*update* + +Takes input data (actual source code rather than spans since we cannot assume +the user has saved the file) and a list of spans to invalidate. Where there are +no invalidated spans, the update call adds data (which will cause an error if +there are conflicts). Where there is no input data, update just invalidates. + +We might want to allow some shortcuts to invalidate an entire file or +recursively invalidate a directory. + + +**Description** + +*get definition* + +Takes a span, returns all 'definitions and declarations' for the identifier +covered by the span. Can return an error if the span does not cover exactly one +identifier or the oracle has no data for an identifier. + +The returned data is a list of 'defintion' data. That data includes the span for +the item, any documentation for the item, a code snippet for the item, +optionally a type for the item, and one or more kinds of definition (e.g., +'variable definition', 'field definition', 'function declaration'). + +*get references* + +Takes a span, returns a list of reference data (or an error). Each datum +consists of the span of the reference and a code snippet. + +*get docs* + +Takes a span, returns the same data as *get definition* but limited to doc strings. + +*get type* + +Takes a span, returns the same data as *get definition* but limited to type information. + +Question: are these useful/necessary? Or should users just call *get definition*? + +*search for identifier* + +Takes a search string or an id, and a struct of search parameters including case +sensitivity, and the kind of items to search (e.g., functions, traits, all +items). Returns a list of spans and code snippets. + + +**Code completion** + +*get suggestions* + +Takes a span (note that this span could be empty, e.g, for `foo.` we would use +the empty span which starts after the `.`; for `foo.b` we would use the span for +`b`), and returns a list of suggestions (is this useful? Is there any difference +from just using the caret position?). Each suggestion consists of the text for +completion plus the same information as returned for the *get definition* call. + + +#### Input data format + +The precise serialisation format of the oracle's input data will likely change +over time. At first, I propose we use csv, since that is what save-analysis +currently supports, and there is good decoding support for Rust. Longer term we +should use a binary format for more efficient serialisation and deserialisation. + +Each datum consists of an identifier, a kind, a span, and a set of fields (the +exact fields are dependent on the kind of data). + +If the datum is for a definition (of a trait, struct, etc.), then the identifier +is an absolute path (including the crate) to that definition. Question: how to +identify impls - do we need to distinguish multiple impls for the same trait and +data type? + +For statements and expressions, the identifier is a path to the expression's +function (or static/const) and a function relative id. Note that this means we +have to invalidate an entire function at a time (or at least all of the function +after the edited portion). It would be nice if we could avoid this and be more +fine-grained about invalidation, any ideas? + +I propose that we follow the save-analysis data format to start with (in terms +of the kinds of data available and the fields for each). However, we should use +identifiers rather than DefIds and distinguish fields from variables. + + +### Racer + +The oracle fulfills a similar role to +[Racer](https://github.com/phildawes/racer). Indeed, forking Racer may be a good +way to start development of the oracle. The oracle should provide more +information and should be more accurate by being more closely integrated with +the compiler. + +Racer could be refactored to be a client of the oracle, thus taking advantage of +more accurate data and a simpler implementation, whilst maintaining its +interface. This would be a nice way to make the oracle's data available to less +sophisticated editors. Alternatively, Racer could make use of the oracle's +metadata but do its own processing of that data to provide an alternate +implementation of an oracle. + + +### DXR and Rustdoc + +Both DXR and Rustdoc could be rewritten to talk to the oracle and run in a live +mode, rather than maintaining their own pre-processed data. This would have some +benefit in keeping these resources up to date as programs are edited (and +reducing the number of ways for doing essentially the same thing). However, this +does not seem like enough motivation to actually do the work. Could be an +interesting student project or something. + + +## Robust compilation + +The goal here is that when the user is typing, we should be able to run the +early stages of the quick-check compiler and still come up with sensible code +completion suggestions. The IDE and compiler can collaborate to some extent +here. + +As long as we can compile as far as type checking, then the compiler should +still generate metadata for the oracle. If we fail later (e.g., in borrow +checking) then we should return errors *and* metadata for the oracle. If we fail +to type check, then we cannot generate meaningful data for the oracle (or if we +succeed at type checking, but use some error recovery). + +THE IDE should instruct the oracle to invalidate some of its data. I believe that +this does not require deep knowledge about the program (i.e., we know a span has +changed and compilation has failed, we can instruct the oracle to invalidate all +data associated with that span. With luck, we can leverage the dependency +information the compiler has for incremental compilation here). + +In some cases a program would fail to parse or pass name resolution, but we +would like to try to type check. For example, + +```rust +fn main() { + let x = foo.bar. +``` + +will not parse, but we would like to suggest code completion options. + +```rust +fn main() { + let foo = foo(); + let x = fo; +} +``` + +will parse, but fail name resolution, but again we would like to suggest code +completion options. + +There are two issues: dealing with incomplete or incorrect names (e.g., `fo` in +the second example), and dealing with unfinished AST nodes (e.g., in the first +example we need an identifier to finish the `.` expression, a `;` to terminate +the let statement, and a `}` to terminate the `main` function). + +A solution to the first problem is replacing invalid names with some magic +identifier, and ignoring errors involving that identifier. @sanxiyn implemented +something like the second feature in a [PR](https://github.com/rust- +lang/rust/pull/21323). His approach was to take a command line argument for +where to 'complete at' and to treat that as the magic identifier. An alternate +approach would be to use a keyword or distinguished identifier which the IDE +could insert (based on the caret position), or to fallback to the magic +identifier whenever there is a name resolution error. + +Similarly during type checking, if we find a mismatched or unknown type, we +should try to continue type checking with the information available so as to +still be able to provide code completion information. We already do this to some +extent with `TyErr`, but we should do better. + +For the second issue, the problem is where to start parsing again and how many +'open' items should be terminated. This is closely related to error recovery in +parsers, which is a well-developed are of research with a long history, and +which I won't attempt to summarise here. As far as I can see, there are two +major differences since we are doing this in the IDE context: we know the extent +of edited code (the span of changes we are passing to the quick-check compiler) +and the previous state of the edited code, and we can likely assume that even in +new code, braces and parentheses are likely to be paired (since an IDE will +insert closing braces, etc.). Assuming that we keep the state of the code the +last time it parsed completely, we can expand the edited span to cover an entire +expression (or other item) and thus we know exactly where to start re-parsing. +In the case where we are writing new code, we can just close all 'open' items. + +Being able to generate more errors before stopping would be an advantage for the +compiler in any case. However, we probably do not want to use these mechanisms +under normal compilation, only when performing a quick-check from the IDE. + + + +## Error format + +Currently the compiler generates text error messages. I propose that we add a +mechanism to the compiler to support different formats for error messages. We +already structure our error messages to some extent (separating the span +information, the message, and the error code). Rather than turning these +components into text in a fairly ad hoc manner, we should preserve that +structure, and some central error handler should convert into a chosen format. +We should support the current text format, JSON (or some other structured +format) for tools to use, and HTML for rich error messages (this is somewhat +orthogonal to this RFC, but has been discussed in the past as a desirable +feature). + + +# Drawbacks + +It's a lot of work. On the other hand the largest changes are desirable for +general improvements in compilation speed or for other tools. + + +# Alternatives + +The oracle and quick-check compiler could be combined in a single tool. This +might be more efficient, but would increase complexity and decrease opportunity +for third party alternatives. + +The oracle could do more - actually perform some of the processing tasks usually +done by IDEs (such as editing source code) or other tools (refactoring, +reformating, etc.). + +Should the oracle hide the quick-check compiler? I.e., the IDE talks only to the +oracle and the oracle requests compilation as needed. This might make things a +bit simpler for the IDE and means less IPC overhead and complexity. Either the +oracle could be responsible for all coordination, or the IDE could remain +responsible for coordinating when crates are handled, and the oracle is +responsible for coordinating calls to the quick check compiler to build a single +crate. + + +# Unresolved questions + +Should the quick-check compilation be provided by a separate tool or a mode of +the compiler? It is fairly different in its operation from the compiler. It +might be better to provide a different 'frontend' rather than adding many more +options to the compiler. (I think the answer is 'yes'). + +Should quick-check be a long running process? It could save some time by not +having to reload metadata, but having to keep metadata for an entire project in +memory would be expensive. We could perhaps compromise by unloading when the +user needs to recompile a different crate. I believe it is probably better in +the long run, but a batch process is OK to start with. + +How and when should we generate crate metadata. It seems sensible to generate +this when we switch to editing/re-compiling a different crate. However, it's not +clear if this must be done from scratch or if it can be produced from the +incremental compilation metadata (see that RFC, I guess). + +What should we call the oracle tool? I don't particularly like "oracle", +although it is descriptive (it comes from the Go tool of the same name). +Alternatives are 'Rider', 'Racer Server', or anything you can think of. + +How do we handle different versions of Rust and interact with multi-rust? +Upgrades to the next stable version of Rust? + +Do we need to standardise error messages for the various parsers to prevent user +confusion (i.e., try to ensure that rustc and the various IDEs give the same +error messages). From 894fc45e48138775d4298ff64b72172101b34b62 Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Tue, 17 Nov 2015 15:51:29 +1300 Subject: [PATCH 2/6] WIP --- text/0000-ide.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/text/0000-ide.md b/text/0000-ide.md index 8a0ab3eaf92..a6c9662c86e 100644 --- a/text/0000-ide.md +++ b/text/0000-ide.md @@ -294,8 +294,8 @@ Takes a span, returns all 'definitions and declarations' for the identifier covered by the span. Can return an error if the span does not cover exactly one identifier or the oracle has no data for an identifier. -The returned data is a list of 'defintion' data. That data includes the span for -the item, any documentation for the item, a code snippet for the item, +The returned data is a list of 'definition' data. That data includes the span +for the item, any documentation for the item, a code snippet for the item, optionally a type for the item, and one or more kinds of definition (e.g., 'variable definition', 'field definition', 'function declaration'). @@ -317,8 +317,8 @@ Question: are these useful/necessary? Or should users just call *get definition* *search for identifier* Takes a search string or an id, and a struct of search parameters including case -sensitivity, and the kind of items to search (e.g., functions, traits, all -items). Returns a list of spans and code snippets. +sensitivity, the scope of the search, and the kind of items to search (e.g., +functions, traits, all items). Returns a list of spans and code snippets. **Code completion** @@ -430,12 +430,12 @@ the let statement, and a `}` to terminate the `main` function). A solution to the first problem is replacing invalid names with some magic identifier, and ignoring errors involving that identifier. @sanxiyn implemented -something like the second feature in a [PR](https://github.com/rust- -lang/rust/pull/21323). His approach was to take a command line argument for -where to 'complete at' and to treat that as the magic identifier. An alternate -approach would be to use a keyword or distinguished identifier which the IDE -could insert (based on the caret position), or to fallback to the magic -identifier whenever there is a name resolution error. +something like the second feature in a +[PR](https://github.com/rust-lang/rust/pull/21323). His approach was to take a +command line argument for where to 'complete at' and to treat that as the magic +identifier. An alternate approach would be to use a keyword or distinguished +identifier which the IDE could insert (based on the caret position), or to +fallback to the magic identifier whenever there is a name resolution error. Similarly during type checking, if we find a mismatched or unknown type, we should try to continue type checking with the information available so as to From 16152360f5906a844d5b4e45e68ded60d2590dc8 Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Tue, 15 Dec 2015 22:29:06 +1300 Subject: [PATCH 3/6] Remove material on error format and robust compilation This is a somewhat separate topic, most of which doesn't need an RFC. The material here didn't really add much. --- text/0000-ide.md | 93 ++---------------------------------------------- 1 file changed, 2 insertions(+), 91 deletions(-) diff --git a/text/0000-ide.md b/text/0000-ide.md index a6c9662c86e..6f3eb3c015d 100644 --- a/text/0000-ide.md +++ b/text/0000-ide.md @@ -141,6 +141,8 @@ communicate these asynchronously to the IDE plugin). In addition we must produce data to update the oracle, this should be done directly, without involving the IDE plugin. +TODO metadata - really? + Quick-check does not generate executable code or crate metadata. However, it should (probably) update the metadata used for incremental compilation. @@ -384,97 +386,6 @@ does not seem like enough motivation to actually do the work. Could be an interesting student project or something. -## Robust compilation - -The goal here is that when the user is typing, we should be able to run the -early stages of the quick-check compiler and still come up with sensible code -completion suggestions. The IDE and compiler can collaborate to some extent -here. - -As long as we can compile as far as type checking, then the compiler should -still generate metadata for the oracle. If we fail later (e.g., in borrow -checking) then we should return errors *and* metadata for the oracle. If we fail -to type check, then we cannot generate meaningful data for the oracle (or if we -succeed at type checking, but use some error recovery). - -THE IDE should instruct the oracle to invalidate some of its data. I believe that -this does not require deep knowledge about the program (i.e., we know a span has -changed and compilation has failed, we can instruct the oracle to invalidate all -data associated with that span. With luck, we can leverage the dependency -information the compiler has for incremental compilation here). - -In some cases a program would fail to parse or pass name resolution, but we -would like to try to type check. For example, - -```rust -fn main() { - let x = foo.bar. -``` - -will not parse, but we would like to suggest code completion options. - -```rust -fn main() { - let foo = foo(); - let x = fo; -} -``` - -will parse, but fail name resolution, but again we would like to suggest code -completion options. - -There are two issues: dealing with incomplete or incorrect names (e.g., `fo` in -the second example), and dealing with unfinished AST nodes (e.g., in the first -example we need an identifier to finish the `.` expression, a `;` to terminate -the let statement, and a `}` to terminate the `main` function). - -A solution to the first problem is replacing invalid names with some magic -identifier, and ignoring errors involving that identifier. @sanxiyn implemented -something like the second feature in a -[PR](https://github.com/rust-lang/rust/pull/21323). His approach was to take a -command line argument for where to 'complete at' and to treat that as the magic -identifier. An alternate approach would be to use a keyword or distinguished -identifier which the IDE could insert (based on the caret position), or to -fallback to the magic identifier whenever there is a name resolution error. - -Similarly during type checking, if we find a mismatched or unknown type, we -should try to continue type checking with the information available so as to -still be able to provide code completion information. We already do this to some -extent with `TyErr`, but we should do better. - -For the second issue, the problem is where to start parsing again and how many -'open' items should be terminated. This is closely related to error recovery in -parsers, which is a well-developed are of research with a long history, and -which I won't attempt to summarise here. As far as I can see, there are two -major differences since we are doing this in the IDE context: we know the extent -of edited code (the span of changes we are passing to the quick-check compiler) -and the previous state of the edited code, and we can likely assume that even in -new code, braces and parentheses are likely to be paired (since an IDE will -insert closing braces, etc.). Assuming that we keep the state of the code the -last time it parsed completely, we can expand the edited span to cover an entire -expression (or other item) and thus we know exactly where to start re-parsing. -In the case where we are writing new code, we can just close all 'open' items. - -Being able to generate more errors before stopping would be an advantage for the -compiler in any case. However, we probably do not want to use these mechanisms -under normal compilation, only when performing a quick-check from the IDE. - - - -## Error format - -Currently the compiler generates text error messages. I propose that we add a -mechanism to the compiler to support different formats for error messages. We -already structure our error messages to some extent (separating the span -information, the message, and the error code). Rather than turning these -components into text in a fairly ad hoc manner, we should preserve that -structure, and some central error handler should convert into a chosen format. -We should support the current text format, JSON (or some other structured -format) for tools to use, and HTML for rich error messages (this is somewhat -orthogonal to this RFC, but has been discussed in the past as a desirable -feature). - - # Drawbacks It's a lot of work. On the other hand the largest changes are desirable for From 3dd97de465b2ff6e7adf034f32aa04df48aba9f3 Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Fri, 8 Jan 2016 09:37:41 +1300 Subject: [PATCH 4/6] Rewrite most of the RFC More focussed on the RLS (formerly oracle). A smaller and mosty simpler design. --- text/0000-ide.md | 522 ++++++++++++++++------------------------------- 1 file changed, 180 insertions(+), 342 deletions(-) diff --git a/text/0000-ide.md b/text/0000-ide.md index 6f3eb3c015d..64a1e8c31f7 100644 --- a/text/0000-ide.md +++ b/text/0000-ide.md @@ -5,75 +5,51 @@ # Summary -This RFC describes how we intend to modify the compiler to support IDEs. The -intention is that support will be as generic as possible. A follow-up internals -post will describe how we intend to focus our energies and deploy Rust support -in actual IDEs. +This RFC describes the Rust Language Server (RLS). This is a program designed to +service IDEs and other tools. It offers a new access point to compilation and +APIs for getting information about a program. The RLS can be thought of as an +alternate compiler, but internally will use the existing compiler. -There are two sets of technical changes proposed in this RFC: changes to how we -compile, and the creation of an 'oracle' tool (name of tool TBC). +Using the RLS offers very low latency compilation. This allows for an IDE to +present information based on compilation to the user as quickly as possible. -This RFC is fairly detailed, it is intended as a straw-man plan to guide early -implementation, rather than as a strict blueprint. +## Requirements -## Compilation model +To be concrete about the requirements for the RLS, it should enable the +following actions: -An IDE will perform two kinds of compilation - an incremental check as the user -types (used to provide error and code completion information) and a full build. -The full build is explicitly signaled by the user (it could also happen -implicitly, for example when the user saves a file). A full build is basically -just a `cargo build` command, as would be done from the command line. It will -take advantage of any future improvements to regular compilation (such as -incremental compilation), but there is essentially no change from a compile -today. It is not very interesting and won't be discussed further. +* show compilation errors and warnings, updated as the user types, +* code completion as the user types, +* highlight all references to an item, +* find all references to an item, +* jump to definition. -The incremental check follows a new model of compilation. This check must be as -fast as possible but does not need to generate machine code. We'll describe it -in more detail below. We call this kind of compilation a 'quick-check'. +These requirements will be covered in more detail in later sections. -This RFC also covers making compilation more robust. +## History note -## The oracle +This RFC started as a more wide-ranging RFC. Some of the details have been +scaled back to allow for more focused and incremental development. -The oracle is a long running daemon process. It will keep a database -representation of an entire project's source code and semantic information (as -opposed to the compiler which operates on a crate at a time). It is -incrementally updated by the compiler and provides an IPC API for providing -information about a program - the low-level information an IDE (or similar tool) -needs, e.g., code completion options, location of definitions/declarations, -documentation for items. +Parts of the RFC dealing with robust compilation have been removed - work here +is ongoing and mostly doesn't require an RFC. -The oracle is a general purpose, low-level tool and should be usable by any IDE -as well as other tools. End users and editors with less project knowledge should -use the oracle via a more friendly interface (such as Racer). - - -## Other shared functionality - -Other functionality, such as refactoring and reformatting will be provided by -separate tools rather than the oracle. These should be sharable between IDE -implementations. They are not covered in this RFC. +The RLS was earlier referred to as the oracle. # Motivation -An IDE collects together many tools into a single piece of software. Some of -these are entirely separate from the rest of the Rust eco-system (such as editor -functionality), some will reuse existing tools in pretty much the same way they -are already used (e.g., formatting code, which should straightforwardly use -Rustfmt), and some will have totally new ways of using the compiler or other -tools (e.g., code completion). - Modern IDEs are large and complex pieces of software; creating a new one from scratch for Rust would be impractical. Therefore we need to work with existing IDEs (such as Eclipse, IntelliJ, and Visual Studio) to provide functionality. These IDEs provide excellent editor and project management support out of the -box, but know nothing about the Rust language. +box, but know nothing about the Rust language. This information must come from +the compiler. An important aspect of IDE support is that response times must be extremely -short. Users expect information as they type. Running normal compilation of an +quick. Users expect some feedback as they type. Running normal compilation of an entire project is far too slow. Furthermore, as the user is typing, the program will not be a valid, complete Rust program. @@ -81,361 +57,223 @@ We expect that an IDE may have its own lexer and parser. This is necessary for the IDE to quickly give parse errors as the user types. Editors are free to rely on the compiler's parsing if they prefer (the compiler will do its own parsing in any case). Further information (name resolution, type information, etc.) will -be provided by the compiler via the oracle. - +be provided by the RLS. -# Detailed design - -## Quick-check compilation - -(See also open questions, below). +## Requirements -We run the quick-check compiler on a single crate. At some point after quick -checking, dependent crates must be rebuilt. This is the responsibility of an -external tool to manage (see below). Quick-check is driven by an IDE (or -possibly by the oracle), not by Cargo. +We stated some requirements in the summary, here we'll cover more detail and the +workflow between IDE and RLS. +The RLS should be safe to use in the face of concurrent actions. For example, +multiple requests for compilation could occur, with later requests occurring +before earlier requests have finished. There could be multiple clients making +requests to the RLS, some of which may mutate its data. The RLS should provide +reliable and consistent responses. However, it is not expected that clients are +totally isolated, e.g., if client 1 updates the program, then client 2 requests +information about the program, client 2's response will reflect the changes made +by client 1, even if these are not otherwise known to client 2. -### Incremental and lazy compilation -Incremental compilation is where, rather than re-compiling an entire crate, only -code which is changed and its dependencies are re-compiled. See -[RFC #1298](https://github.com/rust-lang/rfcs/pull/1298). +### Show compilation errors and warnings, updated as the user types -Lazy compilation is where, rather than compiling an entire crate, we start by -compiling a single function (or possibly some other unit of code), and re- -compiling code which is depended on until we are done. Not all of a crate will -be compiled in this fashion. +The IDE will request compilation of the in-memory program. The RLS will compile +the program and asynchronously supply the IDE with errors and warnings. -These two compilation strategies are faster than the current compilation model -(compile everything, every time). They are somewhat orthogonal - compilation can -be either lazy or incremental without implying the other. The [current -proposal](https://github.com/rust-lang/rfcs/pull/1298) for supporting -incremental compilation involves some lazy compilation as an implementation -detail. +### Code completion as the user types -For quick-checking, compilation should be both incremental and lazy. The input -to the compiler is not just the crate being re-compiled, but also the span of -code changed (normal incremental compilation computes this span for itself, but -the IDE already has this information, so it would be wasteful to recompute it). -As a further optimisation, if the IDE can refer to items by an id (such as a -path), then this could be fed to the compiler rather than a code span to save -the compiler the effort of finding an AST node from a code span. +The IDE will request compilation of the in-memory program and request code- +completion options for the cursor position. The RLS will compile the program. As +soon as it has enough information for code-completion it will return options to +the IDE. -We begin by computing which code is invalidated by the change (that is, any code -which depends on the changed code). We then re-compile the changed code. -Information which is depended upon is looked up in the saved metadata used for -incremental compilation. When we have re-compiled the changed code, then we -output the result (see below). If there are no fatal errors, then we continue to -compile the rest of the invalidated code. - - -### Compilation output - -The output of compilation is either success or a set of errors (as with today's -compiler, but see below for more detail on error message format). However, since -compilation can continue after returning an initial result, we might produce -further errors (I presume that IDEs provide a mechanism for the compiler to -communicate these asynchronously to the IDE plugin). - -In addition we must produce data to update the oracle, this should be done -directly, without involving the IDE plugin. - -TODO metadata - really? - -Quick-check does not generate executable code or crate metadata. However, it -should (probably) update the metadata used for incremental compilation. - - -### Multiple crates - -Quick check only applies to a single crate, however, after some changes we might -need to re-compile dependent crates. This is the IDE's responsibility. In the -short term we can just trigger a full re-build (via Cargo) when the user starts -editing a file belonging to a different crate (there will obviously be some lag -there). The compiler must also generate crate metadata for the modified crate. - -Long term, the IDE might keep track of the dependency graph between crates -(provided by Cargo). The quick-check should signal when a crate's public -interface changes due to re-compilation. In that case the IDE can trigger -background re-compilation of dependent crates (possibly with some -delay/batching). - - -## The Oracle - -The oracle is a long-running tool which takes input from both full builds and -quick-checks, and responds to queries about a Rust program. Of particular note -is that it knows about a whole project, not just a single crate. In fact, other -than as a kind of module, it doesn't much care about the notion of a crate at -all. - -We require a data format for getting metadata from the compiler to the oracle. -Unfortunately none of the existing ones are quite right. Crate metadata is not -complete enough (it mostly only contains data about interfaces, not function -bodies), save-analysis data has been processed too far (basically into strings) -which loses some of the structure that would be useful, debuginfo is not Rust- -centric enough (i.e., does not contain Rust type information) and is based on -expanded source code. Furthermore, serialising any of the compiler's IRs is not -good enough: the AST and HIR do not contain any type or name information, the -HIR and MIR are post-expansion. - -The best option seems to be the save-analysis information. This is in a poor -format, but is the 'right' data (it can be based on an early AST and includes -type and name information). It can be evolved to be more efficient form over the -long run (it has been a low priority task for a while to support different -formats for the information). - -Full builds will generate a dump of save-analysis data for a whole crate. Quick -checks will generate data for the changed code. In both cases the oracle must -incrementally update its knowledge of the source code. How exactly to do this -when neither names nor ids are stable is an interesting question, but too much -detail for this RFC (especially as the implementation of ids in the compiler is -evolving). - -For crates which are not built from source (for example the standard library), -authors can choose to distribute the oracle's metadata to allow users to get a -good IDE experience with these crates. In this case, we only need metadata for -interfaces, not the bodies of functions or private items. The oracle should -handle such reduced metadata. It should be possible to generate the oracle's -metadata from the crate metadata, but this is not a short-term goal. (Note this -will require some knowledge in the IDE too - if there is no corresponding source -code, the IDE cannot 'jump to definition', for example). - -The oracle's data is platform-dependent. We must be careful when working with a -cross-compiled project to generate metadata for the target machine. This -shouldn't be a problem for normal compilation, but it means that quick-check -compilation must be configured for the same target, and care should be taken -with downloaded metadata. - -As well as metadata based on types and names, the oracle should keep track of -warnings. Since code with warnings but no errors is not re-compiled, a tool -outside the compiler must track them for display in the IDE. This will be done -by the oracle. - - -### Details - -#### API - -The oracle's API is a set of IPC calls. How exactly these should be implemented -is not clear. The most promising options are sending JSON over TCP, using -[thrift](https://thrift.apache.org/), or using Cap'n Proto (I'm unclear about -exactly what the transport layer looks like using Cap'n Proto, there is no Cap'n -Proto RCP implementation for Java, but I believe there is an alternative using -shared, memory mapped files as a buffer; I'm not familiar enough with the -library to work out what is needed). - -I've detailed the API I believe we'll need to start with. This is slightly more -than a minimal set. I expect it will expand as time goes by. At some point we -will want to stabilise parts of the API to allow for third party implementations -of the oracle and compiler. - -All API calls can return success or error results. Many calls involve a *span*; -for the oracle's API, this is defined as two byte offsets from the start of the -file (oracle spans must always be contained in a single file). - -There are some alternative span definitions: we could use file and column indices -rather than byte offsets (this has some edge case difficulties with the -definition of a newline - do unicode newlines count? It also requires some extra -computation), we could use character offsets (again involves some more -computation, but might be more robust). - -A problem is that Visual Studio uses UTF16 while Rust uses UTF8, there is (I -understand) no efficient way to convert between byte counts in these systems. -I'm not sure how to address this. It might require the oracle to be able to -operate in UTF16 mode. +* The RLS should return code-completion options asynchronously to the IDE. + Alternatively, the RLS could block the IDE's request for options. +* The RLS should not filter the code-completion options. For example, if the + user types `foo.ba` where `foo` has available fields `bar` and `qux`, it + should return both these fields, not just `bar`. The IDE can perform it's own + filtering since it might want to perform spell checking, etc. Put another way, + the RLS is not a code completion tool, but supplies the low-level data that a + code completion tool uses to provide suggestions. -Where no return value is specified, the call returns success or failure (with a -reason). +### Highlight all references to an item -The philosophy of the API is that most functions should only take a single call, -as opposed to making each function as minimal and orthogonal as possible. This -is because IPC can be slow and response time is important for IDEs. +The IDE requests all references in the same file based on a position in the +file. The RLS returns a list of spans. +### Find all references to an item -**Projects** +The IDE requests all references based on a position in the file. The RLS returns +a list of spans. -Note that the oracle stores no metadata about a project. +### Jump to definition -*init project* +The IDE requests the definition of an item based on a position in a file. The RLS +returns a list of spans (a list is necessary since, for example, a dynamically +dispatched trait method could be defined in multiple places). -Takes a project name, returns an id string (something close to the project's name). -*delete project* - -Takes a project id. - -*list projects* - -Takes nothing, returns a list of project ids. - -Each of the remaining calls takes a project identifier. - - -**Update** - -See section on input data format below. - -*update* - -Takes input data (actual source code rather than spans since we cannot assume -the user has saved the file) and a list of spans to invalidate. Where there are -no invalidated spans, the update call adds data (which will cause an error if -there are conflicts). Where there is no input data, update just invalidates. +# Detailed design -We might want to allow some shortcuts to invalidate an entire file or -recursively invalidate a directory. +## Architecture +The basic requirements for the architecture of the RLS are that it should be: -**Description** +* reusable by different clients (IDEs, tools, ...), +* fast (we must provide semantic information about a program as the user types), +* handle multi-crate programs, +* consistent (it should handle multiple, potentially mutating, concurrent requests). -*get definition* +The RLS will be a long running daemon process. Communication between the RLS and +an IDE will be via IPC calls (tools (for example, Racer) will also be able to +use the RLS as an in-process library.). The RLS will include the compiler as a +library. -Takes a span, returns all 'definitions and declarations' for the identifier -covered by the span. Can return an error if the span does not cover exactly one -identifier or the oracle has no data for an identifier. +The RLS has three main components - the compiler, a database, and a work queue. -The returned data is a list of 'definition' data. That data includes the span -for the item, any documentation for the item, a code snippet for the item, -optionally a type for the item, and one or more kinds of definition (e.g., -'variable definition', 'field definition', 'function declaration'). +The RLS accepts two kinds of requests - compilation requests and queries. It +will also push data to registered programs (generally triggered by compilation +completing). Essentially, all communication with the RLS is asynchronous (when +used as an in-process library, the client will be able to use synchronous +function calls too). -*get references* +The work queue is used to sequentialise requests and ensure consistency of +responses. Both compilation requests and queries are stored in the queue. Some +compilation requests can cause earlier compilation requests to be canceled. +Queries blocked on the earlier compilation then become blocked on the new +request. -Takes a span, returns a list of reference data (or an error). Each datum -consists of the span of the reference and a code snippet. +In the future, we should move queries ahead of compilation requests where +possible. -*get docs* +When compilation completes, the database is updated (see below for more +details). All queries are answered from the database. The database has data for +the whole project, not just one crate. This also means we don't need to keep the +compiler's data in memory. -Takes a span, returns the same data as *get definition* but limited to doc strings. -*get type* +## Compilation -Takes a span, returns the same data as *get definition* but limited to type information. +The RLS is somewhat parametric in its compilation model. Theoretically, it could +run a full compile on the requested crate, however this would be too slow in +practice. -Question: are these useful/necessary? Or should users just call *get definition*? +The general procedure is that the IDE (or other client) requests that the RLS +compile a crate. It is up to the IDE to interact with Cargo (or some other +build system) in order to produce the correct build command and to ensure that +any dependencies are built. -*search for identifier* +Initially, the RLS will do a standard incremental compile on the specified +crate. See [RFC PR 1298](https://github.com/rust-lang/rfcs/pull/1298) for more +details on incremental compilation. -Takes a search string or an id, and a struct of search parameters including case -sensitivity, the scope of the search, and the kind of items to search (e.g., -functions, traits, all items). Returns a list of spans and code snippets. +I see two ways to improve compilation times: lazy compilation and keeping the +compiler in memory. We might also experiment with having the IDE specify which +parts of the program have changed, rather than having the compiler compute this. +### Lazy compilation -**Code completion** +With lazy compilation the IDE requests that a specific item is compiled, rather +than the whole program. The compiler compiles this function compiling other +items only as necessary to compile the requested item. -*get suggestions* +Lazy compilation should also be incremental - an item is only compiled if +required *and* if it has changed. -Takes a span (note that this span could be empty, e.g, for `foo.` we would use -the empty span which starts after the `.`; for `foo.b` we would use the span for -`b`), and returns a list of suggestions (is this useful? Is there any difference -from just using the caret position?). Each suggestion consists of the text for -completion plus the same information as returned for the *get definition* call. +Obviously, we could miss some errors with pure lazy compilation. To address this +the RLS schedules both a lazy and a full (but still incremental) compilation. +The advantage of this approach is that many queries scheduled after compilation +can be performed after the lazy compilation, but before the full compilation. +### Keeping the compiler in memory -#### Input data format +There are still overheads with the incremental compilation approach. We must +startup the compiler initialising its data structures, we must parse the whole +crate, and we must read the incremental compilation data and metadata from disk. -The precise serialisation format of the oracle's input data will likely change -over time. At first, I propose we use csv, since that is what save-analysis -currently supports, and there is good decoding support for Rust. Longer term we -should use a binary format for more efficient serialisation and deserialisation. +If we can keep the compiler in memory, we avoid these costs. -Each datum consists of an identifier, a kind, a span, and a set of fields (the -exact fields are dependent on the kind of data). +However, this would require some significant refactoring of the compiler. There +is currently no way to invalidate data the compiler has already computed. It +also becomes difficult to cancel compilation: if we receive two compile requests +in rapid succession, we may wish to cancel the first compilation before it +finishes, since it will be wasted work. This is currently easy - the compilation +process is killed and all data released. However, if we want to keep the +compiler in memory we must invalidate some data and ensure the compiler is in a +consistent state. -If the datum is for a definition (of a trait, struct, etc.), then the identifier -is an absolute path (including the crate) to that definition. Question: how to -identify impls - do we need to distinguish multiple impls for the same trait and -data type? -For statements and expressions, the identifier is a path to the expression's -function (or static/const) and a function relative id. Note that this means we -have to invalidate an entire function at a time (or at least all of the function -after the edited portion). It would be nice if we could avoid this and be more -fine-grained about invalidation, any ideas? +### Compilation output -I propose that we follow the save-analysis data format to start with (in terms -of the kinds of data available and the fields for each). However, we should use -identifiers rather than DefIds and distinguish fields from variables. +Once compilation is finished, the RLS's database must be updated. Errors and +warnings produced by the compiler are stored in the database. Information from +name resolution and type checking is stored in the database (exactly which +information will grow with time). The analysis information will be provided by +the save-analysis API. +The compiler will also provide data on which (old) code has been invalidated. +Any information (including errors) in the database concerning this code is +removed before the new data is inserted. -### Racer -The oracle fulfills a similar role to -[Racer](https://github.com/phildawes/racer). Indeed, forking Racer may be a good -way to start development of the oracle. The oracle should provide more -information and should be more accurate by being more closely integrated with -the compiler. +### Multiple crates -Racer could be refactored to be a client of the oracle, thus taking advantage of -more accurate data and a simpler implementation, whilst maintaining its -interface. This would be a nice way to make the oracle's data available to less -sophisticated editors. Alternatively, Racer could make use of the oracle's -metadata but do its own processing of that data to provide an alternate -implementation of an oracle. +The RLS does not track dependencies, nor much crate information. However, it +will be asked to compile many crates and it will keep track of which crate data +belongs to. It will also keep track of which crates belong to a single program +and will not share data between programs, even if the same crate is shared. This +helps avoid versioning issues. -### DXR and Rustdoc +## Versioning -Both DXR and Rustdoc could be rewritten to talk to the oracle and run in a live -mode, rather than maintaining their own pre-processed data. This would have some -benefit in keeping these resources up to date as programs are edited (and -reducing the number of ways for doing essentially the same thing). However, this -does not seem like enough motivation to actually do the work. Could be an -interesting student project or something. +The RLS will be released using the same train model as Rust. A version of the +RLS is pinned to a specific version of Rust. If users want to operate with +multiple versions, they will need multiple versions of the RLS (I hope we can +extend multirust/rustup.rs to handle the RLS as well as Rust). # Drawbacks -It's a lot of work. On the other hand the largest changes are desirable for -general improvements in compilation speed or for other tools. +It's a lot of work. But better we do it once than each IDE doing it themselves, +or having sub-standard IDE support. # Alternatives -The oracle and quick-check compiler could be combined in a single tool. This -might be more efficient, but would increase complexity and decrease opportunity -for third party alternatives. - -The oracle could do more - actually perform some of the processing tasks usually +The big design choice here is using a database rather than the compiler's data +structures. The primary motivation for this is the 'find all references' +requirement. References could be in multiple crates, so we would need to reload +incremental compilation data (which must include the serialised MIR, or +something equivalent) for all crates, then search this data for matching +identifiers. Assuming the serialisation format is not too complex, this should +be possible in a reasonable amount of time. Since identifiers might be in +function bodies, we can't rely on metadata. + +This is a reasonable alternative, and may be simpler than the database approach. +However, it is not planned to output this data in the near future (the initial +plan for incremental compilation is to not store information required to re- +check function bodies). This approach might be too slow for very large projects, +we might wish to do searches in the future that cannot be answered without doing +the equivalent of a database join, and the database simplifies questions about +concurrent accesses. + +We could only provide the RLS as a library, rather than providing an API via +IPC. An IPC interface allows a single instance of the RLS to service multiple +programs, is language-agnostic, and allows for easy asynchronous-ness between +the RLS and its clients. It also provides isolation - a panic in the RLS will +not cause the IDE to crash, not can a long-running operation delay the IDE. Most +of these advantages could be captured using threads. However, the cost of +implementing an IPC interface is fairly low and means less effort for clients, +so it seems worthwhile to provide. + +The RLS could do more - actually perform some of the processing tasks usually done by IDEs (such as editing source code) or other tools (refactoring, reformating, etc.). -Should the oracle hide the quick-check compiler? I.e., the IDE talks only to the -oracle and the oracle requests compilation as needed. This might make things a -bit simpler for the IDE and means less IPC overhead and complexity. Either the -oracle could be responsible for all coordination, or the IDE could remain -responsible for coordinating when crates are handled, and the oracle is -responsible for coordinating calls to the quick check compiler to build a single -crate. - # Unresolved questions -Should the quick-check compilation be provided by a separate tool or a mode of -the compiler? It is fairly different in its operation from the compiler. It -might be better to provide a different 'frontend' rather than adding many more -options to the compiler. (I think the answer is 'yes'). - -Should quick-check be a long running process? It could save some time by not -having to reload metadata, but having to keep metadata for an entire project in -memory would be expensive. We could perhaps compromise by unloading when the -user needs to recompile a different crate. I believe it is probably better in -the long run, but a batch process is OK to start with. - -How and when should we generate crate metadata. It seems sensible to generate -this when we switch to editing/re-compiling a different crate. However, it's not -clear if this must be done from scratch or if it can be produced from the -incremental compilation metadata (see that RFC, I guess). - -What should we call the oracle tool? I don't particularly like "oracle", -although it is descriptive (it comes from the Go tool of the same name). -Alternatives are 'Rider', 'Racer Server', or anything you can think of. - -How do we handle different versions of Rust and interact with multi-rust? -Upgrades to the next stable version of Rust? - -Do we need to standardise error messages for the various parsers to prevent user -confusion (i.e., try to ensure that rustc and the various IDEs give the same -error messages). +A problem is that Visual Studio uses UTF16 while Rust uses UTF8, there is (I +understand) no efficient way to convert between byte counts in these systems. +I'm not sure how to address this. It might require the RLS to be able to operate +in UTF16 mode. From 2f9a2a1887019fb70b0522f191345a8670d47563 Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Fri, 15 Jan 2016 09:26:32 +1300 Subject: [PATCH 5/6] Add some more text to alternatives and unanswered questions --- text/0000-ide.md | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/text/0000-ide.md b/text/0000-ide.md index 64a1e8c31f7..a8b2bca3caf 100644 --- a/text/0000-ide.md +++ b/text/0000-ide.md @@ -266,6 +266,14 @@ of these advantages could be captured using threads. However, the cost of implementing an IPC interface is fairly low and means less effort for clients, so it seems worthwhile to provide. +Extending this idea, we could do less than the RLS - provide a high-level +library API for the Rust compiler and let other projects do the rest. In +particular, Racer does an excellent job at providing the information the RLS +would provide without much information from the compiler. This is certainly less +work for the compiler team and more flexible for clients. On the other hand, it +means more work for clients and possible fragmentation. Duplicated effort means +that different clients will not benefit from each other's innovations. + The RLS could do more - actually perform some of the processing tasks usually done by IDEs (such as editing source code) or other tools (refactoring, reformating, etc.). @@ -276,4 +284,15 @@ reformating, etc.). A problem is that Visual Studio uses UTF16 while Rust uses UTF8, there is (I understand) no efficient way to convert between byte counts in these systems. I'm not sure how to address this. It might require the RLS to be able to operate -in UTF16 mode. +in UTF16 mode. This is only a problem with byte offsets in spans, not with +row/column data (the RLS will supply both). It may be possible for Visual Studio +to just use the row/column data, or convert inefficiently to UTF16. I guess the +question comes down to should this conversion be done in the RLS or the client. +I think we should start assuming the client, and perhaps adjust course later. + +What kind of IPC protocol to use? HTTP is popular and simple to deal with. It's +platform-independent and used in many similar pieces of software. On the other +hand it is heavyweight and requires pulling in large libraries, and requires +some attention to security issues. Alternatives are some kind of custom +prototcol, or using a solution like Thrift. My prefernce is for HTTP, since it +has been proven in similar situations. From 37d72bec6b93958a87e88fa10b861ac4cb955ce2 Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Fri, 15 Jan 2016 09:29:26 +1300 Subject: [PATCH 6/6] Add text about dirty buffers --- text/0000-ide.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/0000-ide.md b/text/0000-ide.md index a8b2bca3caf..5e41fe2163e 100644 --- a/text/0000-ide.md +++ b/text/0000-ide.md @@ -166,6 +166,10 @@ Initially, the RLS will do a standard incremental compile on the specified crate. See [RFC PR 1298](https://github.com/rust-lang/rfcs/pull/1298) for more details on incremental compilation. +The crate being compiled should include any modifications made in the client and +not yet committed to a file (e.g., changes the IDE has in memory). The client +should pass such changes to the RLS along with the compilation request. + I see two ways to improve compilation times: lazy compilation and keeping the compiler in memory. We might also experiment with having the IDE specify which parts of the program have changed, rather than having the compiler compute this.