Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting code generators with source maps and multiple source directories #1573

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
302 changes: 302 additions & 0 deletions text/0000-supporting-code-generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
- Feature Name: `source_map`, `include_dir`
- Start Date: 2016-02-29
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

This RFC proposes two changes to the Rust compiler and Cargo in order
to better support code generators:

* Add source mapping support to the compiler that allows the compiler to
bidirectionally associate tokens in an output rust file with one or more
input template files. This then will be used to report error messages in the
original file.
* Add support to `rustc` for multiple source directories, and update Cargo
to automatically add it's `$OUT_DIR` directory to this directory.

# Motivation
[motivation]: #motivation

[Syntex](https://github.com/serde-rs/syntex) is a convenient tool that enables
libraries like [Serde](https://github.com/serde-rs/serde) to support Rust
Nightly-style syntax extensions in Stable Rust. Syntex is a code generator,
where it expands syntax extensions from a template Rust file into a stable Rust
file. This then can be compiled by the Stable Rust compiler.

Unfortunately there are some major challenges to using Syntex which prevents
libraries like Serde getting wide usage. There are three major problems with
Syntex. First, wiring Syntex into a project results in an inconvenient amount
of boilerplate code. It requires the following `build.rs`, that is copy-pasted
into every Serde project, which registers the Serde plugin with Syntex, and
informs Syntex which files it should be expanding:

```rust
extern crate syntex;
extern crate serde_codegen;

use std::env;
use std::path::Path;

fn main() {
let out_dir = env::var_os("OUT_DIR").unwrap();

let src = Path::new("src/queen.rs.in");
let dst = Path::new(&out_dir).join("queen.rs");

let mut registry = syntex::Registry::new();

serde_codegen::register(&mut registry);
registry.expand("", &src, &dst).unwrap();
}
```

It also requires an unfortunate amount of macros to link in the generated
file, with a command like:

```rust
include!(concat!(env!("OUT_DIR"), "/queen.rs"));
```

Second, after a project has been Syntex-ified, it is actually inconvenient to
use in daily development because the generated files produce terrible error
messages. This happens because error locations are reported inside the
generated file, not from within the template file. Debugging an error then
requires opening up the generated file, finding the error, and then manually
searching the template file to find the error.

For example, a type error in `queen.rs.in` might produce this error message
that is in a file:

```
target/debug/build/test-ba65ec36dc6f8bb0/out/queen.rs:25:18: 2:23 error: mismatched types:
expected `u64`,
found `&'static str`
(expected u64,
found &-ptr) [E0308]
target/debug/build/test-ba65ec36dc6f8bb0/out/queen.rs:25 let x: u64 = "foo";
^~~~~
```

Third, because of this difficulty with error locations, most users of Serde do
their development in Nightly Rust with the Serde plugin that is compatible with
Nightly Rust syntax extensions and gives good error locality. Not only does
this cause more of our ecosystem to use Nightly Rust and it's unstable
features, it also requires even more inconvenient boilerplate code to make a
project compatible with Syntex and Nightly Rust plugins. The `build.rs` from
before needs to be modified to:

```rust
#[cfg(feature = "with-syntex")]
mod with_syntex {
extern crate syntex;
extern crate serde_codegen;

use std::env;
use std::path::Path;

pub fn main() {
let out_dir = env::var_os("OUT_DIR").unwrap();

let src = Path::new("src/queen.rs.in");
let dst = Path::new(&out_dir).join("queen.rs");

let mut registry = syntex::Registry::new();

serde_codegen::register(&mut registry);
registry.expand("", &src, &dst).unwrap();
}
}

#[cfg(not(feature = "with-syntex"))]
mod with_syntex {
pub fn main() {}
}

pub fn main() {
with_syntex::main();
}
```

and the entry point into the library needs to be modified to:

```rust
#![cfg_attr(not(feature = "with-syntex"), feature(custom_attribute, custom_derive, plugin))]
#![cfg_attr(not(feature = "with-syntex"), plugin(serde_macros))]

extern crate serde;

#[cfg(feature = "with-syntex")]
include!(concat!(env!("OUT_DIR"), "/lib.rs"));

#[cfg(not(feature = "with-syntex"))]
include!("lib.rs.in");
```

Beyond Syntex, there are a number of other tools that work by way of code
generation:

* [ANTLR](http://www.antlr.org/)
* [Lex](http://dinosaur.compilertools.net/lex/index.html)
* [Protocol Buffers](https://developers.google.com/protocol-buffers/)
* [Thrift](https://thrift.apache.org/)
* [Yacc](http://dinosaur.compilertools.net/yacc/index.html)

It is unlikely these projects would be rewritten in Rust, and so would also be
subject to the same "reporting errors in the generated file" that Syntex has.

# Detailed design
[design]: #detailed-design

This RFC proposes two changes that will help improve Rust's code generation
story.

## Source Mapping
[source mapping]: #source-mapping

Because of the challenges debugging generated code, this RFC proposes that Rust
be extended to produce and consume a file that contains a mapping from the
input generated file to the output Rust file. Lets consider using the rustc
pretty printer to convert one Rust source into another. For example, consider
a simple crate that's made up of two files. `queen.rs`:

```rust
pub mod love;

pub struct Person { ... }
```

and it's submodule, `love.rs`:

```rust
use super::Person;

pub fn find(people: &[Person]) -> Option<&Person> {
people.find(|person| person.lovable())
}
```

The pretty printer produces a single output file that merges the two files
together, and would look something like this:

```
pub mod love {
use super::Person;

pub fn find(people: &[Person]) -> Option<&Person> {
people.find(|person| person.lovable())
}
}

pub struct Person { ... }
```

By itself, this process loses the information that the module `love`
came from the file `love.rs`. To avoid that, the pretty printer will
instead generate a file, `queen.rs.map`, that conceptually contains the
following mapping:

| dst line | dst col | source file | src line | src col | token |
| -------- | ------- | ----------- | -------- | ------- | -------------- |
| 0 | 0 | "queen.rs" | 0 | 0 | pub |
| 0 | 4 | "queen.rs" | 0 | 4 | mod |
| 0 | 8 | "queen.rs" | 0 | 8 | love\_canidates |
| 0 | 24 | "queen.rs" | 0 | 24 | ; |
| 2 | 0 | "love.rs" | 0 | 0 | use |

This mapping will then be used by the Rust compiler during parsing to map
tokens to their original location.

Rather than Rust developing their own custom mapping file, this RFC proposes
that Rust adopt the
[JavaScript Source Map](https://source-map.github.io/)
[v3 specification](https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit)
This would be done in order to simplify the implementation
since there are already a number of X-to-JavaScript Source Map generators.

## Source Search Paths
[paths]: #paths

In order to cut down on the boilerplate necessary including generated source into
a crate, the Rust Compiler should be extended to support the concept of source
search paths, similar to GCC's `-I some-path` option, as in
`rustc -I src -I $OUT_DIR/src`. When Rust needs to look for some file, it will
check first in the current directory, then it will iterate through each search
path until the file is found.

Cargo would then be updated to add the `$OUT_DIR` first in the search path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we'll probably want to namespace this by default, so perhaps cargo can pass -I $OUT_DIR/src by default? That way we won't pollute the output directory if there are multiple trees of output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on second thought I think Cargo's gonna need to be a bit smarter here or we need to expand this section a bit more. So let's say there's two project with source trees like:

// a
Cargo.toml
lib.rs
foo.rs

// b
Cargo.toml
src/lib.rs
src/foo.rs

The "magic build script" for serde will likely mirror the output structure to match the input structure. This means that the build script will generate (assuming no intermediate directory like I mentioned in my comment above this one)

// a
$OUT_DIR/foo.rs

// b
$OUT_DIR/src/foo.rs

I'd hope that both cases would work, but it's not clear to me how we'd implement this to make them both work. Can we expand on exactly what arguments Cargo is passing in these two situations, along with exactly what the compiler is doing in terms of lookup paths?

Specifically, this section is a little vague to me:

When Rust needs to look for some file, it will check first in the current directory, then it will iterate through each search path until the file is found.

This to me seems like it can be interpreted as:

  • I passed src/lib.rs to the compiler
  • The compiler saw mod foo
  • The compiler deduced it needs to look for src/foo.rs
  • All lookup paths are queried for src/foo.rs, where the final lookup path is the current directory

This, however, is incompatible with passing absolute paths to the compiler (which Cargo does frequently). as lookup paths will be queried for an absolute path which fails.

order, which would allow generated files to be referenced with `mod queen;`
instead of `include!(...)`.

The exceptions to this are the `#[path="..."]`, `include!(...)` and
related macros, which in order to remain backwards compatible, must be relative
to the Rust entry point. This means that if there is a directory like:

```
src/lib.rs
src/submodule/foo.rs
```

The file `src/lib.rs` could contain:

```rust
include!("submodule/queen.rs");

#[path = "submodule/queen.rs")
mod another_queen;
```

# Drawbacks
[drawbacks]: #drawbacks

* Nick Fitzgerald (@fitzgen), the coauthor of the Source Map specification,
has written two articles
([1](http://fitzgeraldnick.com/weblog/55/),
[2](http://fitzgeraldnick.com/weblog/62/))
about the limitations of Source Maps at encoding things like scoping
information. He attempts to address this in this
[RFC](https://github.com/source-map/source-map-rfc/pull/4), which adds
DWARF-like records to Source Maps. For Syntex, this doesn't matter since
it's source language is also Rust, but if a language like
[dyon](https://github.com/PistonDevelopers/dyon) grow the ability to generate
Rust code, they might want to encode their scoping information in the Source
Map.
* WebAssembly is probably going to adopt a different approach to source mapping
([1](https://github.com/WebAssembly/design/issues/602),
[2](https://github.com/WebAssembly/spec/issues/258),
[3](https://github.com/WebAssembly/design/blob/master/Tooling.md)),
but it hasn't been spec-ed out yet. How can we avoid being locked into a
format with a potentially short lifespan?
* One option is for the compiler to just ignore old Source Map files. Since
this is mainly used for debug info, this would just fail gracefully back to
error locations in the generated file.

# Alternatives
[alternatives]: #alternatives

Instead of source maps we could embed the span in the source itself through a
macro that's interpreted by the parser. Some options for this include:

* `#line "foo.rs" 1 2` in the style of CPP.
* `set_line!("foo.rs", 1, 1);`
* `set_source_span!("foo.rs", 1, 1);`
* `set_location!("foo.rs", 1, 1);`

In order to make this work with Syntex, the pretty printer would have to grow
the ability to physically output tokens in the same line and column so that
each token wouldn't need to be annotated with the correct positioning.

Another option would be to just adopt the [DWARF](http://dwarfstd.org/)
debuginfo format, but it would be much more complicated to produce.

# Unresolved questions
[unresolved]: #unresolved-questions

* If the macro-form for source mapping is used, it will see little use outside
of generated code, and doesn't necessarily need to be added to the default
namespace. Could it instead be placed somewhere to be used with
`#[macro_use]`?
* Is it actually backwards incompatible to have `#[path="..."]` find paths in
the search paths?
* Do we need to track column information?