Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sourcemap generation to JavaScript codegen target #3675

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

AlisCode
Copy link

@AlisCode AlisCode commented Oct 6, 2024

Here goes!

The PR is still kind of WIP - mainly, I'd like to talk about how to approach testing (and actually add way more snapshot tests before this gets a chance to be merged).

I also have 2 expects on calls to sourcemap - I'm sure this is not the right way to do it, so I'm wondering what you guys think is the best approach ? Is it fine to have a "sourcemap error" variant in the big Gleam Error enum ?

Context

I highly recommend watching the following talk if you're unfamiliar with Sourcemaps : https://www.youtube.com/watch?v=6LI0BJIiamg

Long story short, a sourcemap is a file that defines mappings from specific places in a generated file (the generated javascript code - which needs to be referring to its sourcemap file) to a source file (the gleam module).

It's used for example :

  • By the browser to be able to display the actual line of the Gleam source instead of the generated JS code
  • In tools like Sentry (I'm not affiliated, but they're huge in the industry) to be able to point to the actual source instead of the generated Javascript code
  • By debuggers such as vscode-js-debug to be able to provide debugging straight from your Gleam sources : this PR enables using a debugger on Gleam, which I'm sure some people are excited about (I am! 🎉).

Content

This PR adds sourcemap generation for the JavaScript codegen target.
Sourcemap generation is disabled by default, and needs to be manually enabled using the following config in the gleam.toml manifest :

[javascript]
sourcemaps = true

This is done out of symmetry with other languages and tools - for example TypeScript does not automatically generate them. I imagine as your codebase grows large, it can end up being quite a heavy process. Also I'm assuming if this ever gets released, it should maybe be marked experimental.

Comment on lines 279 to 280
sourcemap.to_writer(&mut output).expect("Failed to write sourcemap to memory. This is a bug in sourcemap, please report.");
let output = String::from_utf8(output).expect("Sourcemap did not generate valid UTF-8. This is a bug in sourcemap, please report.");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really not a fan of those expect - opened to any suggestion here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the bit about opening a bug ticket, all panics have this information printed for them

///
/// Used to produce SourceMaps.
#[derive(Debug)]
pub struct CursorPositionWriter<'a, W> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or move this to compiler-core/src/io/cursor_position_writer.rs ?
Also not a fan of the name

Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I've left a bunch of notes inline

@@ -662,6 +662,8 @@ pub struct ErlangConfig {
pub struct JavaScriptConfig {
#[serde(default)]
pub typescript_declarations: bool,
#[serde(default)]
pub sourcemap: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub sourcemap: bool,
pub sourcemaps: bool,

type_reference,
"export {}",
line(),
sourcemap_reference,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this is not with the type reference at the top?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The declaration of the SourceMap URL does not have to be at a specific location, it can be anywhere in the JS code. It is something that's separate from the type reference (which is just TS-specific in my understanding), so I think it's best to keep them as separate declaration blocks for symmetry.

I take it from this comment that you'd rather have the sourcemap declaration at the top, so I moved it :)

WithSourceMapLocation {
document: Box<Self>,
start: LineColumn,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module should not know anything about code generation or Gleam, but here it knows about JavaScript specific implementation details. Any changes to this module should not be specific to anything other than pretty printing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When building a sourcemap, for each mapping we want to add, we need two things :

  • The line+column of the mapping in the generated (JS) file
  • The line+column inside the source (Gleam) file it maps to.

In my implementation, the field start in this enum variant stores the information of where the Document is in the source file (a SrcSpan passed through the LineNumbers util). The only place we know the line+column on the generated JS file is when the pretty printer is writing the Document inside a file - then we know as we're writing "The document is being printed at the line x, column y", and this info is lost afterwards.

I agree that conceptually the pretty module shouldn't know anything about JS, but given that the JS files that Gleam produces are pretty-printed, I don't see how it's possible to generate sourcemaps without hooking into the pretty-printer itself. Any idea I could explore ?

match self {
SourceMapEmitter::Null => (),
SourceMapEmitter::Emit(source_map) => {
tracing::debug!("emitting one sourcemap entry");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this please

}

impl SourceMapEmitter {
pub fn add_mapping(&mut self, dst_line: u32, dst_col: u32, src_location: LineColumn) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No abbreviations please

writer: &mut impl Utf8Writer,
src: &EcoString,
path: &Utf8Path,
source_map_emitter: &mut SourceMapEmitter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting never uses a source map emitter, there should be no change to this API.


impl<'a, W: Utf8Writer + std::fmt::Write> Utf8Writer for CursorPositionWriter<'a, W> {
/// A wrapper around `fmt::Write` that has Gleam's error handling.
fn str_write(&mut self, str: &str) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterates the string twice. Can we do it once instead?

The pretty printer also iterates over and counts the characters, can that work be reused?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reused LineColumn inside of this now. As long as LineNumber does not change behavior (the TODOs mention graphemes which confuses me a bit), then this should work.

Whenever the behavior of LineColumn changes, the test should break :)

self.line += newline_count;
if newline_count > 0 {
let lastline = str.lines().last().expect("Should have at least one line");
self.column = lastline.len();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is number of bytes and not unicode graphemes or characters. Is what sourcemaps expect?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sourcemap works with byte indices, in my understanding this is correct. I'll add a test with emojis inside the code, see if that breaks anything.

assert_sourcemap!(
"fn add_2(x) {
x + 2
}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this single case enough to test all the given functionality? I presume it is not only implemented for addition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it isn't. I have tested the sourcemaps with all examples from the Gleam tour on a personal project, I'm going to add those test cases in this PR, but I wanted to get some feedback on my snapshot testing before doing that. Indeed your comment below indicates that it's not clear how to test this for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I add more test cases, let's agree on how to test this feature

source: compiler-core/src/javascript/tests/sourcemaps.rs
expression: sourcemap_viz
---
note:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snapshot is rather verbose, to the extent of being hard to read. A more concise format would be very helpful

Copy link
Author

@AlisCode AlisCode Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree, but discussions with Gleam folks on discord didn't really lead to a precise conclusion. What exactly would you be happy with?

Maybe each test could be "what is the Gleam location of this specific location in the JS file" - then the snapshot would contain the whole JS + Gleam code on top of each other, with the one position you're trying to assert?

As an example, say you write a test case that asks "What is the position in the Gleam code of the character 1 line 1 in the generated JS file ?", the snapshot would look like this:

  ┌─ original.gleam:1:1
  │
1 │ fn add_2(x) {
  │ ^ This code
  │    x + 2
  │  }
  │
  ┌─ generated.js:2:1
  │
2 │ function add_2(x) {
  │ ^ Gets mapped to this
  │     return x + 2;
  │  }
}

@lpil lpil marked this pull request as draft October 7, 2024 20:14
@lpil
Copy link
Member

lpil commented Oct 21, 2024

When you are ready for a review please un-draft this PR. Thank you!

@AlisCode AlisCode force-pushed the feat/sourcemap-javascript branch from e76e450 to 2c5e729 Compare October 24, 2024 13:02
@AlisCode AlisCode marked this pull request as ready for review October 24, 2024 13:48
@AlisCode
Copy link
Author

I think I need your opinion on stuff before I can write a bunch of unit tests or refactor the PR, so I'm marking this as ready.

@AlisCode AlisCode requested a review from lpil October 28, 2024 22:36
@lpil
Copy link
Member

lpil commented Dec 2, 2024

Hello! Are you waiting on something from me? I can't see anything here but wasn't sure if I missed anything.

@AlisCode
Copy link
Author

AlisCode commented Dec 3, 2024

Hi @lpil,

this PR got a bit old and so it has a few merging conflicts that I'll resolve today after work.

There are 2 points which needs clarification before I proceed to add more unit tests :

#3675 (comment)
About this comment : The approach I took in this PR is that we're adding a variant to the pretty.rs module, in order for the information on Line + Column to stay relevant even after a prettification pass. Indeed, sourcemaps work by referencing columns+lines in generated code, so if the pretty module is not aware of sourcemaps, I don't see how this is implementable since to the best of my knowledge it's the only place where tracking the info "where is this code in the generated JS source" makes sense.

#3675 (comment)
About this comment : Is this approach good enough ? If it is, then I'll add some more test cases :)

@cdaringe
Copy link

cdaringe commented Dec 7, 2024

I noticed that if i already have a compiled build cache, then i switch to this compiler with sourcemaps turned on, maps arent emitted. i have to run a clean first, presumably to bypass a cache hit that does not consider this flag as input to the cache?

@AlisCode
Copy link
Author

@cdaringe, I believe the issue you ran into is orthogonal to this PR? I don't remember touching anything to do with caching. Do we need a special edge-case handling here ?

I see the branch has conflicts, I'll rebase this :)

@AlisCode AlisCode force-pushed the feat/sourcemap-javascript branch from c747d28 to 56d12a1 Compare December 29, 2024 13:16
@cdaringe
Copy link

Im on mobile, thus didn’t search the source, but iff the build cache algo can consider new inputs, im suggesting that the config options added here are missing in that input. It may ir may not be open for extension. I sorta kinda would think that a hash on gleam.toml would invalidate the build cache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sourcemaps
3 participants