Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Proposal for "modern" API #13

Closed
CAD97 opened this issue Oct 12, 2019 · 29 comments
Closed

RFC: Proposal for "modern" API #13

CAD97 opened this issue Oct 12, 2019 · 29 comments
Labels
C-enhancement Category: enhancement E-help-wanted Call for participation: Help is requested to fix this issue

Comments

@CAD97
Copy link

CAD97 commented Oct 12, 2019

In order to address concerns in #11, #12, wycats/language-reporting#6, and probably others.

I've been experimenting with merging the APIs of codespan/language-reporting/annotate-snippets, and the below API surface is what that I think makes the most sense.

NOTE: the suggested API has changed multiple times from feedback, see conversation starting at this comment for the most recent API and discussion.

Original Proposal

An experimental implementation of the API based on #12 is at CAD97/retort#1 (being pushed within 24 hours of posting, I've got one last bit to "port" but I've got to get to bed now but I wanted to get this posted first).

EDIT: I've reconsidered this API, though the linked PR does implement most of it. I'm sketching a new slightly lower-level design from this one, and the diagnostic layout of this current API will probably be a wrapper library around annotate-snippets. (I get to use the retort name!)

API
use termcolor::WriteColor;

trait Span: fmt::Debug + Copy {
    type Origin: ?Sized + fmt::Debug + Eq;

    fn start(&self) -> usize;
    fn end(&self) -> usize;
    fn new(&self, start: usize, end: usize) -> Self;
    fn origin(&self) -> &Self::Origin;
}

trait SpanResolver<Sp> {
    fn first_line_of(&mut self, span: Sp) -> Option<SpannedLine<Sp>>;
    fn next_line_of(&mut self, span: Sp, line: SpannedLine<Sp>) -> Option<SpannedLine<Sp>>;
    fn write_span(&mut self, w: &mut dyn WriteColor, span: Sp) -> io::Result<()>;
    fn write_origin(&mut self, w: &mut dyn WriteColor, origin: Sp) -> io::Result<()>;
}

#[derive(Debug, Copy, Clone)]
pub struct SpannedLine<Sp> {
    line_num: usize,
    char_count: usize,
    span: Sp,
}

impl Span for (usize, usize) {
    type Origin = ();
}

impl<Sp: Span<Origin=()>> Span for (&'_ str, Sp) {
    type Origin = str;
}

impl<Sp: Span> SpanResolver<Sp> for &str
where Sp::Origin: fmt::Display;

mod diagnostic {
    #[derive(Debug, Clone)]
    struct Diagnostic<'a, Sp: Span> {
        pub primary: Annotation<'a, Sp>,
        pub code: Option<Cow<'a, str>>,
        pub secondary: Cow<'a, [Annotation<'a, Sp>]>,
    }

    #[derive(Debug, Clone)]
    struct Annotation<'a, Sp: Span> {
        pub span: Sp,
        pub level: Level,
        pub message: Cow<'a, str>,
    }

    #[derive(Debug, Copy, Clone, Eq, PartialEq, Hash)]
    enum Level {
        Err,
        Warn,
        Info,
        Hint,
    }

    impl<Sp: Span> Diagnostic<'_, Sp> {
        pub fn borrow(&self) -> Diagnostic<'_, Sp>;
        pub fn into_owned(self) -> Diagnostic<'static, Sp>;
    }

    impl<Sp: Span> Annotation<'_, Sp> {
        pub fn borrow(&self) -> Annotation<'_, Sp>;
        pub fn into_owned(self) -> Annotation<'static, Sp>;
    }

    impl fmt::Display for Level;
}

mod style {
    #[derive(Debug, Copy, Clone, Eq, PartialEq, Hash)]
    enum Mark {
        None,
        Start,
        Continue,
        End,
    }

    #[non_exhaustive]
    #[derive(Debug, Copy, Clone)]
    pub enum Style {
        Base,
        Code,
        Diagnostic(Level),
        LineNum,
        TitleLine,
        OriginLine,
    }

    trait Stylesheet {
        fn set_style(&mut self, w: &mut dyn WriteColor, style: Style) -> io::Result<()>;
        fn write_marks(&mut self, w: &mut dyn WriteColor, marks: &[Mark]) -> io::Result<()>;
        fn write_divider(&mut self, w: &mut dyn WriteColor) -> io::Result<()>;
        fn write_underline(
            &mut self,
            w: &mut dyn WriteColor,
            level: Level,
            len: usize,
        ) -> io::Result<()>;
    }

    struct Rustc; impl Stylesheet for Rustc;
    // other styles in the future
}

mod renderer {
    fn render<'a, Sp: Span>(
        w: &mut dyn WriteColor,
        stylesheet: &dyn Stylesheet,
        span_resolver: &mut dyn SpanResolver<Sp>,
        diagnostic: &'a Diagnostic<'a, Sp>,
    ) -> io::Result<()>;

    fn lsp<'a, Sp: Span + 'a>(
        diagnostics: impl IntoIterator<Item = Diagnostic<'a, Sp>>,
        source: Option<&'_ str>,
        span_resolver: impl FnMut(Sp) -> lsp_types::Location,
    ) -> Vec<lsp_types::PublishDiagnosticsParams>;
}

Notes:

  • I've skipped imports and implementation bodies for clarity. All definitions are exported where I've written them.
  • I've liberally used dyn Trait, so the only monomorphization should be over the Span type.
  • I'm not particularly attached to any of the organization of exports, things can move around.
  • Span::new is only used for impl SpanResolver<impl Span> for &str; making that impl more specific can get rid of that trait method.
  • SpanResolver takes &mut for its methods primarily because it can, in order to allow use of a single-threaded DB that requires &mut access for caching as a span resolver.
  • Span resolution is passed through SpanResolver at the last moment such that a SpanResolver can supply syntax highlighting for errors.
  • SpanResolver::write_origin only gets io::Write because styling is done ahead of time by Stylesheet. Because WriteColor does not have an upcast method, this means we can't use dyn WriteColor anywhere that will end up calling SpanResolver::write_origin. This can be changed to take WriteColor if desired.
  • Diagnostic's layout is tuned to have similar layout to the language server protocol's Diagnostic.
  • Diagnostic is set up so that Diagnostic<'_, Sp> can be borrowed but also an owned Diagnostic<'static, Sp> can be produced by using Cows. This eases use with constructed diagnostics.
  • Potential style improvement: extend Style::Code to be an enum of general code token types (e.g. the list from pygments), SpanResolver::write_span just gets the ability to set the style to one of those, which goes through the StyleSheet for styling.
@CAD97
Copy link
Author

CAD97 commented Oct 12, 2019

cc list of potentially interested parties:

@zbraniecki
Copy link
Contributor

zbraniecki commented Oct 12, 2019

Hi @CAD97 !

Thanks for taking a look at this and I'm excited to work together, if we end up deciding that we're aiming for the same shape of the API.

I did a first read of your proposal and only have very rough, unsorted initial thoughts to share so far. Most of them are critical, but please, do not read it as a general criticizm - it's just easier to focus on what I see as potentially incompatible. The code generally looks good!

Foundational Crate

You seem to use dependencies liberally. I'm of strong opinion that this functionality should be treated as a "foundational crate" per raphlinus terminology and as such should aim to maintain minimal, or even zero if possible, dependency tree. Your proposal has 22 dependencies, annotate-snippets has zero.
I'm open to add some, if we see a value in doing so, but I'd like to keep it at the very minimum and if possible look for dependencies that themselves don't introduce a long chain of dependencies in return.

API

Your API seems to resemble more imperative approach much closer than what I'm aiming for with annotate-snippets. The chain operations Diagnostics::build().code().level().primary().secondary() feels fairly awkward to me I must admit. I've been working at TC39 on JavaScript at the time when that model was very popular (jQuery!) and I'm not very convinced that it leads to a clean API use and highly-maintainable code (I'm not talking about our code, but the code that uses the API).

In principle, I see what I call Snippet struct as a data structure. Rust doesn't have a very good way to provide complex slash optional parameter list to constructor, but I came to conclusion that likely in this case we don't need it.

So, instead of making people write Snippet::new(title, description, level, code, &[secondary], ...) or your Snippet::build().title().description().level().code(), we can just expose ability to do Snippet { title: None, description: Some(...), level: Some("E203"), ... }.
It's very clean, well maintained, allows for omitting optional fields with Default trait, and what's most powerful, can be then wrapped in many different ways to construct an instance.

If I'm not mistaken, the only shortcoming of that approach is lack of inter-field validation allowing one to define a snippet with 5 lines, but an annotation on a sixth.

Initially, that led me to try to squeeze one or more constructors, because I dislike ability to produce internally inconsistent data, but eventually I decided that it's not necessary to fix it. In annotate-snippets model the struct gets passed to a function which generates a DisplayList our of it. It's on this step that the validation of the input (Snippet) takes place and can be rejected.

I really like this model for a foundational functionality of a foundational crate. I'm sure it's possible to build different nicer high-level APIs to aid people in constructing Snippet or Slice, but I think we should focus on exposing everything we can now and letting others or ourselves add sugar later.

Your model seems much closer to what codespan and in result language-reporting are doing. I would prefer us to avoid starting with that API, while I'm open to get to it later.

Flexibility

annotate-snippets is intentionally very vague about the core concepts in it. It is meant to be a vague API which can be used for displaying errors, but also for tutorials, helpers, explainers etc. In particular I believe that the range of annotations in them can be very vast and I'd love to end up with something flexible and extendible.
For that reason I'd like to minimize the amount of places in our API that we name after some function. Due to the nature of your proposed API you not only do that, but also add API methods like "primary", "secondary", etc. while at the same time making it a bit less visible what is the relation between them, if it's possible to specify multiple or just one (can I specify multiple "secondary()"? I can! But can I do the same to "primary()"? If so, what it means? And can I specify multiple "level()"? Or will the next one overwrite the previous one?), and so on.

For me, the difference between:

    let diagnostic = Diagnostic::build()
        .primary(
            Annotation::build()
                .span(50..777)
                .message("mismatched types")
                .build(),
        )
        .code("E0308")
        .level(Level::Err)
        .secondary(
            Annotation::build()
                .span(55..69)
                .message("expected `Option<String>` because of return type")
                .build(),
        )
        .secondary(
            Annotation::build()
                .span(76..775)
                .message("expected enum `std::option::Option`, found ()")
                .build(),
        )
        .build();

and

    let snippet = Snippet {
        title: Some(Annotation {
            id: Some("E0308"),
            label: Some("mismatched types"),
            annotation_type: AnnotationType::Error,
        }),
        slices: &[Slice {
            source,
            line_start: Some(51),
            annotations: vec![
                SourceAnnotation {
                    label: "expected `Option<String>` because of return type",
                    annotation_type: AnnotationType::Warning,
                    range: (5, 19),
                },
                SourceAnnotation {
                    label: "expected enum `std::option::Option`",
                    annotation_type: AnnotationType::Error,
                    range: (23, 725),
                },
            ],
        }],
    };

is that in the latter, the only meaning is assigned to annotations via annotation_type, and there may be many different ones including custom ones added by the user.

In the former, the title is decided by a primary and thus cannot be different than an in-source annotation, the level is defined per slice, not per annotation, and we use the concept of primary/secondary which would only be extensible via some tertiary?

annotate-snippets allows you to define title different than any source annotations, or footer, or multiple footers, multiple titles, multiple annotations, which may or may not overlap. It seems like a fairly low-level approach, but in result very flexible.

Your API seems much more constrain and intended for getting just one style of annotation snippets.

Cow

You use a lot of Cow but I'm not sure how valuable it is. I'm not as convinced to that decision, but I think that in all cases I've been able to find, &str works well, and I'm not sure if we need an owned messages by the annotation/slice/snippet.

Performance

I focus a lot on performance in my rewrite of annotate-snippets in #12 . I was unable to compile your PR so I can't measure performance but I think it'd be important to compare.

API discrepancy

Finally, and I struggle to ask this since it borderlines NIH-bias which I'd like to avoid, I'm wondering why do you feel the need to design a new API. I asked the authors of codespan and language-reporting if they see any shortcomings of my crate and they stated that they don't and the plan to converge on annotate-snippets API seems reasonable unless we find any limitation of it.
Have you encountered a limitation? Do you dislike annotation-snippets API? Any other reason?

===

I've been on vacation over this week, but I plan to get back and finish #12 now. If you believe that there's a value in diverging from its API, I'm happy to discuss the above differences (or any other that you see!) and compare the results!
I don't want to be attached to my API but I have not seen or heard of any problem with it yet, and I find it the most flexible, robust and extendible of all I've seen.
In your code I noticed several ideas which I like, but they're internal rather than API surface and I'd rather see them as PRs against annotate-snippets than a full new API.

Let me know what you think!

@CAD97
Copy link
Author

CAD97 commented Oct 12, 2019

Big apologies: I forgot that I had an out-of-date example in the repository when I posted this; I didn't mean to mislead. I've dropped the builder API (if you look at the API overview in the OP, it's not present) and that's why the example won't compile. There were a few other issues as well because I pushed a WIP checkpoint commit. The implementation is still not quite finished for performance testing as I still need to port one final bit first.

So let me address the points some:

Dependencies

lsp-types is the only heavy dependency, and should be completely opt-in for the LSP target. The PR now correctly marks the dependency as optional. The LSP conversion could also be pulled out-of-tree if really desired, but the diagnostic layout should be LSP-friendly. Of the other three:

  • termcolor (+ wincolor, winapi-util, winapi+friends) is I think the best option for an abstracted color-capable sink (codespan/language-reportingagree with me here). @brendanzab says the current use ofansi_term(which also depends onwinapi+friends) is what currently keeps them from going all-in on annotate-snippets` (being the lack of injectable custom writer and global state).
  • scopeguard: highly used leaf crate; could be inlined if really desired.
  • bytecount: I included it because clippy yelled at me not to do a naive byte count for newline characters. It's also a leaf crate. Given expected source sizes, it might be reasonable to drop it. Is only used in impl SpanResolver for &str.

For some reason I'm having trouble installing cargo-tree so I can't give a better overview currently.

API

The builder API used in the example was old and discarded; I'm just allowing record creation much closer to annotate-snippets now:

let diagnostic = Diagnostic {
    primary: Annotation {
        span: (50, 777),
        level: Level::Err,
        message: "mismatched types".into(),
    },
    code: Some("E0308".into()),
    secondary: vec![
        Annotation {
            span: (55, 69),
            level: Level::Info,
            message: "expected `Option<String>` because of return type".into(),
        },
        Annotation {
            span: (76, 775),
            level: Level::Err,
            message: "expected enum `std::option::Option`, found ()".into(),
        },
    ] // can also be borrowed, `vec` for simplicity
    .into(),
};

Flexibility

Yeah, the API I've proposed as-is is quite targeted at diagnostics and being compatible with the LSP diagnostic API. I'm perfectly happy to generalize it some more, though.

The intent as currently designed is that the primary annotation is the "short" annotation (i.e. the one that shows up as the red squiggly before asking for more information), and the secondary annotations are any related information. Note that the secondary annotations do not need to be within the primary span or even from the same Span.origin; that's just a limitation of my current implementation trying to cobble something together to demonstrate the API.

Cow

The main reason I've used Cow is to help the use case where a consumer wants to build a Diagnostic/Annotation list up during some analysis. If they're purely borrowed, the user has to implement a similar structure that's owned to build up the list, then borrow it when pushing it to the sink annotate-snippets renderer. I'd like to avoid that necessity, but I can be talked down if it's a sticking point, especially if we provide an owned variant separate instead of mushing them together with Cow. (Basically, I'm not using Cow as copy-on-write but as maybe-owned.)

Performance

Should be on par with cleanup, as the real work was directly ported from it. Again, port is not quite finished, so can't be measured yet.

NIH

I'm happy to find a middle ground, this is mainly to share the results of my experimentation so that we can try to find the best end result. The big parts of this I'd really like to see adopted in some form: 1) A LSP target is practical, 2) delayed Span resolution to support source highlighting, and 3) a generic WriteColor target rather than baking in ANSI or no color as the only target implicitly.

@matklad
Copy link
Member

matklad commented Oct 12, 2019

I didn't read this past the first paragraph (I hope to do so, once I have more time), but I'd advise against using lsp-types as a dependency, even an optional one, It changes way to often, and I don't think it's worth it make it non-breaking. Rather, I think it's better to vendor just diagnostics related bits of the LSP, which should be stable. See, for example how I've hard-coded setup/tear down messages in lsp-server. It seems like the case where depending on a 3rd party lib doesn't actually by that much

@zbraniecki
Copy link
Contributor

Thanks for the response!

I'm just responding to your points, I'll review more tomorrow:

Dependencies

Yes, I can understand why you aim for lsp-types. Maybe we want it, I'll have to look deeper.

As for termcolor vs ansi_term - I'm not strongly opinionated. I can see us switching if termcolor is better. I want it to be optional tho.

Others - I'd like to all extra functionality to be optional. I can see an argument for, say, unicode crate for character boundaries count and breaking, but I believe it should be optional because a fundantional crate is likely to be used often without that extra piece.

I saw things like serde and serde_json compiled as part of your PR. I think they should not be necessary.
If they're needed by lsp-types I will question whether we need lsp-types :)

API

Oh, I'm sorry for not reviewing the example vs. code. As I said, I just got back from PTO.

As for your example, it looks much better to me! I still would like to separate title from in-source annotations, because it's easier to reference/copy one from another than to separate if you need them to be different.

Flexibility

As with above - it's easier to specialize later, if the crate allows for generic behavior. I'd like to not dictate what behavior happens on the level of our crate, but rather on the level of some higher level API. Then you can have an API that uses the foundational crate and is specific to generating errors and it picks the "primary" and sets it as the only title, and as the primary annotation and so on.

Cow

I can be talked up to incorporate Cow. My main concern about it is that I'm torn on whether the API should be constructable. There's one way to think about it that it should. There's another that there should be some higher level that constructs it (maybe over time) and then spawns the Diagnostics struct.
As I said, I can definitely be talked into the idea of making the API buildable.

One idea against is that I'm trying to minimize allocations and one way to do that was to cut out all Vec replacing them with &[]. That would make the code simpler at the cost that when you need to build, you do this prior. My initial position is that in many cases you would be able to avoid that step so there's a tangible win. But maybe I'm wrong here! I'll investigate!

Performance

Cool! Let's both finish our PRs and compare! :)

NIH

Oh, awesome! I'm so happy to see actual points listed out this way. This is very helpful, thank you!

  • I'm not sure about the lsp-target, but I admit ignorance and commit to investigate.
  • I'm not sure if I understand "delayed Span resolution to support syntax highlighting". I think annotate-snippets supports syntax highlighting, and if it doesn't, it's a bug and I'm open to look for ways to fix it!
  • I am playing with the styles in the cleanup branch now. I want it to be agnostic of the styling, and able to support different stylesheets and even different themes (think - you build different DisplayList for terminal and different for web browser or some other rich GUI, which is different from just styling it).

The last step is the last piece of my cleanup, so I'd like to finish my proposal. I like your updated example much more (d'uh! it's closer to annotate-snippets ;)) and I'm really excited to see you bringing your experience and perspective! Let's finish our PRs and compare them and figure out how to approach it.
From your response I feel that we're aiming for close enough goal that we should end up with a single crate, which is awesome (the alternative is also awesome, but less attractive for the bus-factor removal which I care about deeply!).

Onwards! :)

@CAD97
Copy link
Author

CAD97 commented Oct 12, 2019

Just a few more notes:

  • I'm convinced now that LSP support should be out-of-tree.
  • To that effect, perhaps I/we could make a diagnostics library that supports LSP that wraps annotate-snippets's more general snippet annotation.
  • The &mut dyn WriteColor sink is probably my most desired part of this proposal.
  • Second most desired is asking the SpanResolver to paint the span.

The rest is a question of what layout decisions should be at what layer.

@zbraniecki
Copy link
Contributor

Sounds good! I'll look into &mut dyn WriteColor tomorrow, and then into SpanResolver. Give me several days and I should have something tangible (either code or opinion at least!)

@kevinmehall
Copy link

Is the intention that you could impl Span for types like codemap::Span and codespan::Span that do the "index into concatenation of all files" trick? I don't see how they could provide origin() without reference to codemap::CodeMap / codespan::Files. Alone, these span types can't provide a filename to display, or even test whether two spans refer to the same file for the Eq bound.

The requirement that a span's start and end are usize also precludes implementing Span for types that store positions as a separate line and column number.

What if the interface exposed column numbers for the library to do its layout computations, but kept Span opaque? Here's a rough sketch:

trait SpanResolver<Sp> {
    type Origin: Eq + Display;
    type Line: Copy + Eq + Ord + Into<usize>;

    fn resolve(&mut self, span: Sp) -> ResolvedSpan<Origin, Line>;
    fn write_source(
        &mut self,
        w: &mut dyn WriteColor,
        file: &Origin,
        line: Line,
        start_col: usize,
        end_col: usize,
    ) -> io::Result<()>;
    fn next_line(&mut self, line: Line) -> Option<Line>;
}

struct ResolvedSpan<Origin, Line> {
    file: Origin,
    start_line: Line,
    start_col: usize,
    end_line: Line,
    end_col: usize,
}

Line is generic and not usize because it's O(n) to index a plain string by line number. An implementation of Line for str could cache the byte index of the start of line.

@CAD97
Copy link
Author

CAD97 commented Oct 12, 2019

@kevinmehall yes, the intent was that it would be possible to implement Span for types that map multiple files into one dimensional space. I overlooked that resolution to the origin would have to go through the resolver as well for that to work. Of course, now with annotate-snippets being targeted lower-level than this sketch originally aimed, I don't think a single render should need to deal with spans from multiple origins at all. (That would be the next level up.)

@kevinmehall
Copy link

I don't think a single render should need to deal with spans from multiple origins at all.

rustc has some diagnostics where the primary and secondary spans are in different files. How would that be handled? For example:

error[E0326]: implemented const `X` has an incompatible type for trait
 --> src/lib.rs:5:12
  |
5 |   const X: () = ();
  |            ^^ expected u32, found ()
  | 
 ::: src/f1.rs:2:13
  |
2 |    const X: u32;
  |             --- type in trait
  |
  = note: expected type `u32`
             found type `()`

@zbraniecki
Copy link
Contributor

@CAD97
Copy link
Author

CAD97 commented Oct 13, 2019

Let me drop this here while it's fresh on my mind, though I should admit it's unbaked while I need to run off to bed again.

Here's the input structure I'm experimenting with for a much reduced annotate-snippets responsibility from the OP, now with documentation!:

API
/// A span of a snippet to be annotated.
pub trait Span {
    /// A position within the span. The only requirement is that positions
    /// sort correctly for every `Span` from the same origin.
    ///
    /// For most spans, this will be a `usize` index
    /// or a `(usize, usize)` line/column pair.
    type Pos: Ord;

    /// The start position of the span.
    ///
    /// This is expected to be equivalent in cost to an access.
    fn start(&self) -> Self::Pos;

    /// The end position of the span.
    ///
    /// This is expected to be equivalent in cost to an access.
    fn end(&self) -> Self::Pos;
}

/// A type to resolve spans from opaque spans to information required for annotation.
pub trait SpanResolver<Sp> {
    /// Write the span to a [`WriteColor`] sink.
    ///
    /// When calling `write_span`, the writer is styled with the base style.
    /// Style can be customized manually or by proxying through the stylesheet.
    fn write_span(
        &mut self,
        w: &mut dyn WriteColor,
        stylesheet: &mut dyn Stylesheet,
        span: Sp,
    ) -> io::Result<()>;

    /// Count the number of characters wide this span is in a terminal font.
    fn count_chars(&mut self, span: Sp) -> usize;

    /// Get the first line in a span. The line includes the whole line,
    /// even if that extends out of the source span being iterated.
    ///
    /// If the input span is empty, the line it is on is produced.
    fn first_line_of(&mut self, span: Sp) -> Line<Sp>;
    /// Get the next line in a span. The line includes the whole line,
    /// even if that extends out of the source span being iterated.
    ///
    /// If the next line does not overlap the span at all, `None` is produced.
    fn next_line_of(&mut self, span: Sp, previous: Line<Sp>) -> Option<Line<Sp>>;
}

/// A reference to a line within a snippet.
#[derive(Debug, Copy, Clone, Eq, PartialEq)]
pub struct Line<Sp> {
    /// The span of the line, _not_ including the terminating newline (if present).
    pub span: Sp,
    /// The line number.
    pub num: usize,
}

/// One snippet to be annotated.
///
/// # Example
///
/// In the error message
///
// Please keep in sync with the `moved_value` example!
/// ```text
/// error[E0382]: use of moved value: `x`
///  --> examples/moved_value.rs:4:5
///   |
/// 4 |     let x = vec![1];
///   |         - move occurs because `x` has type `std::vec::Vec<i32>`, which does not implement the `Copy` trait
/// 7 |     let y = x;
///   |             - value moved here
/// 9 |     x;
///   |     ^ value used here after move
/// ```
///
/// there are three snippets: one for each bit of code being annotated.
/// The spans to create this error are:
///
/// ```
/// # use retort::*;
/// # let line4 = 0..0; let line7 = 0..0; let line9 = 0..0;
/// let snippets = &[
///     Snippet {
///         annotated_span: line4,
///         spacing: Spacing::TightBelow,
///     },
///     Snippet {
///         annotated_span: line7,
///         spacing: Spacing::Tight,
///     },
///     Snippet {
///         annotated_span: line9,
///         spacing: Spacing::Tight,
///     },
/// ];
/// ```
#[derive(Debug, Copy, Clone)]
pub struct Snippet<Sp> {
    pub annotated_span: Sp,
    pub spacing: Spacing,
}

/// Spacing of a snippet.
#[derive(Debug, Copy, Clone, Eq, PartialEq)]
pub enum Spacing {
    /// Emit a spacing line above and below the snippet.
    Spacious,
    /// Emit a spacing line below the snippet only.
    TightAbove,
    /// Emit a spacing line above the snippet only.
    TightBelow,
    /// Emit no spacing lines.
    Tight,
}

/// An annotation of some span.
///
/// # Example
///
/// In the error message
///
// Please keep in sync with the `moved_value` example!
/// ```text
/// error[E0382]: use of moved value: `x`
///  --> examples/moved_value.rs:4:5
///   |
/// 4 |     let x = vec![1];
///   |         - move occurs because `x` has type `std::vec::Vec<i32>`, which does not implement the `Copy` trait
/// 7 |     let y = x;
///   |             - value moved here
/// 9 |     x;
///   |     ^ value used here after move
/// ```
///
/// there are three annotations: one on each line of code.
/// The annotations in this error are:
///
/// ```
/// # use retort::*;
/// # let line4_x = 0..0; let line7_x = 0..0; let line9_x = 0..0;
/// let annotations = &[
///     Annoatation {
///         span: line4_x,
///         message: "move occurs because `x` has type `std::vec::Vec<i32>`, which does not implement the `Copy` trait",
///         level: Level::Information,
///     },
///     Annoatation {
///         span: line7_x,
///         message: "value moved here",
///         level: Level::Information,
///     },
///     Annoatation {
///         span: line9_x,
///         message: "value used here after move",
///         level: Level::Error,
///     },
/// ];
/// ```
#[derive(Debug, Copy, Clone)]
pub struct Annotation<'a, Sp> {
    /// The span to be annotated.
    pub span: Sp,
    /// The message to attach to the span.
    pub message: &'a str,
    /// The severity of the annotation.
    pub level: Level,
}

/// A level of severity for an annotation.
#[derive(Debug, Copy, Clone)]
pub enum Level {
    /// An error on the hand of the user.
    Error,
    /// A warning of something that isn't necessarily wrong, but looks fishy.
    Warning,
    /// An informational annotation.
    Information,
    /// A hint about what actions can be taken.
    Hint,
}

I suspect adding a "collection of snippets" type (that if I'm not mistaken, is closer to the current Snippet than mine, which is closer to Slice) would be desirable on top of this and as the argument for the render function. That "snippet collection" would also hold the front matter. Or we could even just say that we only care about snippet annotation and the caller should handle the other lines.

EDIT: recording two concerns that this API cannot cover (yet?):

  • Alignment of notes not attached to annotation with the separator
  • Alignment of the separator between annotated slices.

@Marwes
Copy link

Marwes commented Oct 13, 2019

@matklad

I didn't read this past the first paragraph (I hope to do so, once I have more time), but I'd advise against using lsp-types as a dependency, even an optional one, It changes way to often, and I don't think it's worth it make it non-breaking.

Did you see gluon-lang/lsp-types#117 ? Would let lsp-types go to 1.0 at the cost of having a slightly more awkward API (though in a way, a more honest one).

@brendanzab
Copy link
Member

I just want to say I really appreciate the time being put into this! Would be exciting to converge on something nice.

@wycats
Copy link

wycats commented Oct 14, 2019

@CAD97 I'm cool with merging language-reporting into annotate-snippets in something like this form. How would you feel about merging me into the Github team? I'd love to continue driving this part of the Rust ecosystem forward in direct terms.

@CAD97
Copy link
Author

CAD97 commented Oct 14, 2019

@wycats I'm not on any team right now, I just chose this as a pet project for October (as I'm working on a blog post series that would benefit from it) 😛


I've a second potential API over at CAD97/retort#2 that embraces the sink-layer quality of annotate-snippets more fully. (i.e. a diagnostics library could provide a more LSP-shaped API that renders to annotate-snippets or LSP.) It's also got an amount of experimental support for syntect themed highlighting of source code, but that's all entirely optional (and probably not all that desirable, I was just playing with the idea, and it should be at the next level up).

I'll update the OP with an adjusted RFC proposal of the adjusted API once I have a chance to write some examples against that API, probably tomorrow sometime.

@zbraniecki
Copy link
Contributor

@CAD97 - I pushed an update to my cleanup branch which:

  • switches us to io::Write
  • adds termcolor as an option (next to term_ansi, but this can go away)
  • cleans up the renderer/style separation allowing for things like ASCII vs HTML output

My next steps are to:

  • Verify that simple ASCII vs. rich-Unicode output works
  • Verify that HTML renderer works
  • Complete the features by porting tests and other examples from master

At that point, I'd like to consider merging the PR into master and releasing updated annotate-snippets.

As for your proposal, I'm not sure where is the best place to fit it - should we first align our codebases before any releases, or is it ok to do this on top of cleanup?

On the high level, except of feature set, I think we need to decide on which of the APIs is better, the one in cleanup or your proposal.

The example I work with looks like this:

Screen Shot 2019-10-16 at 12 11 44 PM

And the APIs are:

cleanup:

      let snippet = Snippet {
          title: Some(Annotation {
              id: Some("E0308"),
              label: Some("mismatched types"),
              annotation_type: AnnotationType::Error,
          }),
          footer: &[],
          slices: &[Slice {
              source,
              line_start: Some(51),
              origin: Some("src/format.rs"),
              annotations: &[
                  SourceAnnotation {
                      label: "expected `Option<String>` because of return type",
                      annotation_type: AnnotationType::Warning,
                      range: 5..19,
                  },
                  SourceAnnotation {
                      label: "expected enum `std::option::Option`",
                      annotation_type: AnnotationType::Error,
                      range: 23..725,
                  },
              ],
          }],
      };

@CAD97's:

    let snippets = &[
        Snippet::Title {
            message: Message {
                text: &"mismatched types",
                level: Level::Error,
            },
            code: Some(&"E0308"),
        },
        Snippet::AnnotatedSlice {
            slice: Slice {
                span: 1..721,
                origin: Some(Origin { file: &"src/format.rs", pos: Some((4, Some(5))) }),
                spacing: Spacing::TightBelow,
                fold: false,
            },
            annotations: &[
                Annotation {
                    span: 5..19,
                    message: Message {
                        text: &"expected `Option<String>` because of return type",
                        level: Level::Warning,
                    },
                },
                Annotation {
                    span: 23..725,
                    message: Message {
                        text: &"expected enum `std::option::Option`",
                        level: Level::Error,
                    },
                },
            ],
        },
    ];

Performance wise, @CAD97's PR is 12.375 us on average to produce it, cleanup is 9.3012 us on my laptop (~25% faster).

@CAD97
Copy link
Author

CAD97 commented Oct 17, 2019

@zbraniecki my impl uses a lot of dyn Trait even internally whereas you're using impl Trait in a lot of the same places; for comparison's sake, could you do a comparison with my modified branch?

It's a tradeoff between monomorphization cost and performance cost. On my branch with the same benchmark, I got [10.700 us 11.047 us 11.435 us] with dyn everywhere and [9.8312 us 10.078 us 10.368 us] (change [-12.587% -7.7650% -3.0543%]) with impl everywhere (&dyn DebugAndDisplay specialized to &str).

Branch with the adjustment: https://github.com/CAD97/retort/tree/take-two-impl

Note that my version has a few features that cleanup doesn't as well, the main one being the use of &dyn Display allowing for use of e.g. &format_args!("expected {} because of return type", expected_ty) rather than &format!("expected {} because of return type") (assuming your code flow allows for that delayed resolution). I'll update OP with more meaninful comparison by Saturday.

@zbraniecki
Copy link
Contributor

Good analysis! I also noticed that just switching from ftm::Write to io::Write cost me 8.0->9.0

As for which of those trades we should take I'm open to discuss it!

@brendanzab
Copy link
Member

My next steps are to:

  • Verify that simple ASCII vs. rich-Unicode output works

@zbraniecki Oooh, does that mean we'll be able to use box drawing characters for the underlines/gutters?

@zbraniecki
Copy link
Contributor

Yeah! Christopher's POC already introduced that and my cleanup pr is designed to support pluggable "renderers".

@brendanzab
Copy link
Member

brendanzab commented Oct 17, 2019

One other thing that might be interesting to figure out, but may be beyond the scope of this issue, is how to integrate pretty printers and styled output in messages. This would allow for things like type diffs and domain specific errors - something that could be handy for Rust and other languages too, like LALRPOP. This may lead to some interesting questions in regards to how it interacts with the different backends, however.

@zbraniecki
Copy link
Contributor

I've seen Christopher toying with that. I consider it to be out of scope as long as we don't need to alter the input to produce it and I think we don't .

So, worth opening a new issue I think :)

@CAD97
Copy link
Author

CAD97 commented Oct 17, 2019

OK, I decided not to update the OP but just to do the comparison here. I've attempted to normalize some naming and removed export locations for the purposes of comparison, as well as used a couple shorthands.

cleanup API
struct Snippet<'a> {
    title: Option<Annotation<'a>>,
    footer: &'a [Annotation<'a>],
    slices: &'a [Slice<'a>],
}

struct Slice<'a> {
    source: &'a str,
    line_start: Option<usize>,
    origin: Option<&'a str>,
    annotations: &'a [SourceAnnotation<'a>],
}

struct Annotation<'a> {
    code: Option<&'a str>,
    text: Option<&'a str>,
    level: Level,
}

struct SourceAnnotation<'a> {
    range: Range<usize>,
    text: &'a str,
    level: Level,
}

enum Level { None, Error, Warning, Info, Note, Help }

fn format<'a>(&Snippet<'a>) -> FormattedSnippet<'a>;
trait fn render(&self, &mut impl io::Write, &FormattedSnippet<'_>) -> io::Result<()>;
impl render for struct AsciiRenderer<S: style>;

enum Style { Emphasized, Error, Warning, Info, Note, Help, LineNo, None }
trait fn style(&mut dyn io::Write, impl Display, &[Style]) -> io::Result<()>;
impl style for ascii_term; termcolor; nocolor;

Based on the fact that with both ansi_term and termcolor we're just spitting out ANSI color codes without trying to make sure the environment supports it, I think it might even be better just to vendor the use of ansi_term and emit the ANSI codes directly, then provide a NoStyle and a DefaultANSI implementation of the style provider. This is, of course, assuming that we want to avoid a public dependency on termcolor and sinking to a termcolor::Write, and leaving it up to the user to enable ANSI support in the Windows Console if they want it. (Note: Windows Terminal supports ANSI by default, but Windows Console will remain the default host for backwards compatibility, and ANSI is opt-in there IIRC.)

My take-two proposed API

Note that I've removed some bits that aren't included in the cleanup API such as the folding and spacing controls, just for brevity of this comparison.

enum SnippetPart<'a, Span> {
    Title { message: Message<'a>, code: Option<&'a dyn Display> },
    AnnotatedSlice { slice: Slice<'a, Span>, annotations: &'a [Annotation<'a, Sp> },
    Note { message: Message<'a> }
}

struct Slice<'a, Span> {
    span: Span,
    origin: Option<Origin<'a>>,
}

struct Origin<'a> {
    text: &'a dyn Display,
    line_start: Option<usize>,
}

struct Annotation<'a, Span> {
    span: Span,
    message: Message<'a>,
}

struct Message<'a> {
    text: &'a dyn Display,
    level: Level,
}

enum Level { Error, Warning, Information, Hint }

use termcolor::WriteColor;
fn render<'a, Span>(&mut dyn WriteColor, &'a [SnippetPart<'a, Span>], style: &'a mut dyn Stylesheet, resolver: &'a mut dyn write_span<Span>) -> io::Result<()>;

enum Style { Base, Level(Level), Title, LineNo, OriginIndicator, Origin }
trait Stylesheet {
    fn set_style(&mut self, &mut dyn WriteColor, Style) -> io::Result<()>;
    fn write_marks(&mut self, &mut dyn WriteColor, ...) -> io::Result<()>;
}
impl Stylesheet for struct RustcStyle;

trait fn write_span<Span>(&mut self, &mut dyn WriteColor, &mut dyn Stylesheet, Span) -> io::Result<()>;
// also some other span resolution stuff but I'll skip it for now

Having played around with this API and observed the cleanup API, I'd now like to propose one more "meet in the middle" API that I think can preserve the simplicity of cleanup while keeping some of the power of my take-two. This still includes the delayed Span resolution, because I think it's the key superpower I'm pushing for. I'll prepare a PR against cleanup that implements this API, hopefully by the end of this coming Sunday. That will inform the specific functions that input traits have to provide.

struct Snippet<'a, Span> {
    title: Option<Title<'a>>,
    slices: &'a [Slice<'a, Span>],
}

struct Title<'a> {
    code: Option<&'a dyn Display>,
    message: Option<Message<'a>>,
}

struct Slice<'a, Span> {
    span: Span,
    origin: Option<&'a dyn Display>,
    space_before: bool,
    annotations: &'a [Annotation<'a, Span>],
    footer: &'a [Message<'a>],
}

struct Annotation<'a, Span> {
    span: Span,
    message: Message<'a>,
}

struct Message<'a> {
    text: &'a dyn Display,
    level: Level,
}

enum Level { Error, Warning, Info, Note, Help }

fn format<'a, Span>(&Snippet<'a, Span>, &[dyn|impl] SpanFormatter<Span>) -> FormattedSnippet<'a, Span>;
struct FormattedSnippet<'a, Span> { .. } // RefIntoIter of structured output lines
struct AsciiRenderer { emit_ansi_color: bool }
fn AsciiRenderer.write<'a, Span>(&FormattedSnippet<'a, Span>, &[mut|impl] dyn io::Write, &[dyn|impl] SpanRenderer<Span>) -> io::Write<()>;

// these are approximate until impl shows what they actually need to do
trait Span: Clone {
    type Pos: PartialOrd;
    fn start(&self) -> Pos;
    fn end(&self) -> Pos;
    fn new(&self, start: Pos, end: Pos) -> Self;
}
trait SpanFormatter<Span> {
    fn split_lines(&self, Span) -> impl Iterator<Item=Span> + '_;
    fn count_columns(&self, Span) -> usize;
}
trait SpanRenderer<Span> {
    fn write(&self, &mut [dyn|impl] io::Write, &Span) -> io::Result<()>;
}

impl Span for ops::Range<usize>;
impl SpanFormatter<impl Span<Pos=usize>> for &str;
impl SpanRenderer<impl Span<Pos=usize>> for &str;

Specific benefits of this API:

  • Input text doesn't have to be &str; it can be anything that can implement Display. This is especially useful when the origin source isn't a string type but can be converted to one, so the intermediate allocation doesn't have to happen. Playing with take-two I think this dyn has negligible effect on runtime, the real killer to my time is dyn indirection to the span resolvers IIUC.
  • Spans have delayed resolution. This allows the SpanRenderer to emit ANSI coloring for the displayed spans.
  • FormattedSnippet is public for alternate renderers to use, such as ones that want to use box drawing characters. The layout work is done (i.e. placing lines where they need to be) and all the renderer does is translate the structured output into io::Write calls.
  • "No dependencies"; we vendor the ANSI formatting codes that we use, and our provided renderer allows controlling whether they are emitted. (In effect we have a dependency on the ANSI spec rather than a specific Rust implementation of emitting ANSI codes.)

I note that you've claimed that the cleanup API would support coloring of the source snippets. Even if we just admit that we only care about ANSI-compatible terminals, the zero-width ANSI control codes would mess up your column calculations and require a re-allocation of the source string with said ANSI control codes in it.

On domain-specific errors:

¯\_(ツ)_/¯

calc.lalrpop:6:5: 6:34: Ambiguous grammar detected

  The following symbols can be reduced in two ways:
    Expr "*" Expr "*" Expr

  They could be reduced like so:
    Expr "*" Expr "*" Expr
    ├─Expr──────┘        │
    └─Expr───────────────┘

  Alternatively, they could be reduced like so:
    Expr "*" Expr "*" Expr
    │        └─Expr──────┤
    └─Expr───────────────┘

  Hint: This looks like a precedence error related to `Expr`. See the LALRPOP
  manual for advice on encoding precedence.

The closest I can get to something like this would be

let source = r#"Expr "*" Expr "*" Expr"#;
let snippet = Snippet {
    title: Some(Title {
        code: None,
        message: Some(Message {
            text: &"Ambiguous grammar detected"
            level: Level::Error
        }),
    }),
    slices: &[
        Slice {
            span: 0..22,
            origin: Some(&"calc.lalrpop"),
            space_before: true,
            annotations: &[Annotation {
                span: 0..22,
                message: Message {
                    text: &"These symbols can be reduced in two ways",
                    level: Error,
                },
            }],
            footer: &[Message {
                text: &"They could be reduced like so:",
                level: Level::Info,
            }],
        },
        Slice {
            span: 0..22,
            origin: None,
            space_before: false,
            annotations: &[
                Annotation {
                    span: 0..13,
                    message: Message {
                        text: &"Expr",
                        level: Level::Info,
                    },
                },
                Annotation {
                    span: 0..22,
                    message: Message {
                        text: &"Expr",
                        level: Level::Info,
                    },
                },
            ],
            footer: &[Message {
                text: &"Alternatively, they could be reduced like so:",
                level: Level::Info,
            }],
        },
        Slice {
            span: 0..22,
            origin: None,
            space_before: false,
            annotations: &[
                Annotation {
                    span: 9..22,
                    message: Message {
                        text: &"Expr",
                        level: Level::Info,
                    },
                },
                Annotation {
                    span: 0..22,
                    message: Message {
                        text: &"Expr",
                        level: Level::Info,
                    },
                },
            ],
            footer: &[],
        },
    ],
};

There's always the option of interleaving "standard" errors handled with annotate-snippets with domain-specific errors handled manually, even if that is suboptimal.

@CAD97
Copy link
Author

CAD97 commented Oct 21, 2019

I've made a PR with an implementation of my new proposed API, and honestly, if we acquiesce support for non-ANSI coloring, I think it's the best yet. It even supports both the "stick &str in the slice root, use relative offsets" AND "use offsets into a late-bound source" at the same time!

My benchmark results, with dyn where I put it by default:

format [take-3 &str]    time:   [10.768 us 11.084 us 11.448 us]
format [take-3 Range]   time:   [11.037 us 11.427 us 11.871 us]
format [cleanup]        time:   [11.008 us 11.398 us 11.828 us]

so basically it's all within noise of each-other, which is great!

As an example of the API, see the big example I wrote on Snippet:

Example

To produce the error annotation

error[E0277]: `std::sync::MutexGuard<'_, u32>` cannot be sent between threads safely
  --> examples/nonsend_future.rs:23:5
   |
5  | fn is_send<T: Send>(t: T) {
   |    -------    ---- required by this bound in `is_send`
...
23 |     is_send(foo());
   |     ^^^^^^^ `std::sync::MutexGuard<'_, u32>` cannot be sent between threads safely
   |
   = help: within `impl std::future::Future`, the trait `std::marker::Send` is not implemented for `std::sync::MutexGuard<'_, u32>`
note: future does not implement `std::marker::Send` as this value is used across an await
  --> examples/nonsend_future.rs:15:3
   |
14 |     let g = x.lock().unwrap();
   |         - has type `std::sync::MutexGuard<'_, u32>`
15 |     baz().await;
   |     ^^^^^^^^^^^ await occurs here, with `g` maybe used later
16 | }
   | - `g` is later dropped here

two snippets are used:

# use annotate_snippets::*;
let first_snippet = Snippet {
    title: Some(Title {
        code: Some(&"E0277"),
        message: Message {
            text: &"`std::sync::MutexGuard<'_, u32>` cannot be sent between threads safely",
            level: Level::Error,
        },
    }),
    slices: &[Slice {
        span: WithLineNumber {
            data: "fn is_send<T: Send>(t: T) {\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n    is_send(foo());",
            line_num: 5,
        },
        origin: Some(&"examples/nonsend_future.rs"),
        annotations: &[
            Annotation {
                span: 4..11,
                message: None,
            },
            Annotation {
                span: 14..18,
                message: Some(Message {
                    text: &"required by this bound in `is_send`",
                    level: Level::Info,
                })
            },
            Annotation {
                span: 67..74,
                message: Some(Message {
                    text: &"`std::sync::MutexGuard<'_, u32>` cannot be sent between threads safely",
                    level: Level::Error,
                })
            },
        ],
        footer: &[Message {
            text: &"within `impl std::future::Future`, the trait `std::marker::Send` is not implemented for `std::sync::MutexGuard<'_, u32>`",
            level: Level::Help,
        }],
    }],
};
let second_snippet = Snippet {
    title: Some(Title {
        code: None,
        message: Message {
            text: &"future does not implement `std::marker::Send` as this value is used across an await",
            level: Level::Note,
        },
    }),
    slices: &[Slice {
        span: WithLineNumber {
            data: "    let g = x.lock().unwrap();\n    baz().await;\n}",
            line_num: 14,
        },
        origin: Some(&"examples/nonsend_future.rs"),
        annotations: &[
            Annotation {
                span: 8..9,
                message: Some(Message {
                    text: &"has type `std::sync::MutexGuard<'_, u32>`",
                    level: Level::Info,
                }),
            },
            Annotation {
                span: 36..47,
                message: Some(Message {
                    text: &"await occurs here, with `g` maybe used later",
                    level: Level::Error,
                })
            },
            Annotation {
                span: 50..51,
                message: Some(Message {
                    text: &"`g` is later dropped here",
                    level: Level::Info,
                })
            },
        ],
        footer: &[],
    }],
};

The API still needs fold support (cleanup does not have it yet) as well as an implementation for printing footer notes (cleanup does not have it yet) as well as anonymizing line numbers (cleanup does not have it yet), and then I think it's feature parity with master.

PR is up as #14.

@zbraniecki
Copy link
Contributor

@CAD97 - apologies for radio silence. I haven't had a lot of time lately to finish this work, but I feel like I now have all the pieces in place to sort it out and I'll try to take a deeper dive into your proposal and finishing cleanup branch!

@jyn514
Copy link
Member

jyn514 commented Jan 14, 2020

Is there a proposed merge() function for spans or should that be done by the library users? I need to be able to merge spans from different parts of an expression (10 + 20 should be 7 characters, using the spans from 10, +, and 20).

@jyn514
Copy link
Member

jyn514 commented Jan 14, 2020

The code for merge is super simple btw: brendanzab/codespan#149

@epage
Copy link
Contributor

epage commented Mar 13, 2024

Closing this and #14 in favor of #90, #91, #101, and #102.

If there are things that were missed, I'd recommend creating more specific issues.

@epage epage closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2024
epage added a commit to epage/annotate-snippets-rs that referenced this issue Sep 27, 2024
…ion-3.x

chore(deps): update github/codeql-action action to v3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: enhancement E-help-wanted Call for participation: Help is requested to fix this issue
Projects
None yet
Development

No branches or pull requests

9 participants