WIP: Output JS-compatible line numbers #5264

helixbass · 2019-12-17T17:32:00Z

@GeoffreyBooth this is an interesting one I came across while running the ESLint plugin against the Coffeescript repo source code:

It was crashing running ESLint against test/regex.coffee because of the literal U+2028/U+2029 (Unicode linebreak) characters in that file

Looking into it, it turns out our location data/line numbers aren't exactly JS-compatible - per eg here, JS considers newline (\n), carriage return (\r), carriage return + newline (\r\n), U+2028 and U+2029 to be line terminators

Whereas we strip carriage returns and then just consider newlines to be line terminators

So what was happening is that in a file with "unusual" line terminators according to JS (eg the U+2028 and U+2029 in test/regex.coffee), ESLint was doing the JS version of counting linebreaks and then getting confused when a certain line didn't have the number of columns reported by our location data

So thus far this PR swaps in the JS-compatible regex for detecting linebreaks in the obvious places. Which did get ESLint working against test/regex.coffee

But I'm probably inclined to not try and get this merged before initial release, as (a) it should only cause problems in files where someone's got "fake linebreak" characters floating around and (b) I think it's worth thinking through more fully

For one thing, I think that if someone had a \r in their source code that wasn't "attached" to a following \n, it'd need to be accounted for - since we strip out \r's, I think we'd need to go a step further than the locationDataCompensations (that we added into the lexer to get accurate location data despite things like \r's having been stripped out) and do something like additionally "remember" the source offsets of any "unattached" \r's and then take those into account when generating location data

But ultimately it seems like it'd be a good idea to be compatible with JS's definition of linebreaks in our location data?

Based on ast-splat-param-location-data, here is just the diff against that branch

helixbass · 2019-12-17T17:32:36Z

src/lexer.coffee

@@ -1314,6 +1314,8 @@ OPERATOR = /// ^ (

 WHITESPACE = /^[^\n\S]+/

+exports.LINEBREAK = LINEBREAK = /\r\n|[\r\n\u2028\u2029]/ug


This regex came from here

GeoffreyBooth · 2019-12-17T18:06:37Z

But ultimately it seems like it'd be a good idea to be compatible with JS's definition of linebreaks in our location data?

Yes, I feel like this might fix a lot of the bugs we get reported about wrong line numbers in source maps and stack traces.

helixbass · 2019-12-30T20:06:41Z

Reopened as #5277

helixbass commented Dec 17, 2019

View reviewed changes

helixbass force-pushed the ast-js-compatible-line-numbers branch 4 times, most recently from e0ce666 to 771ea6e Compare December 22, 2019 20:35

use js-compatible linebreak regex

2c9cc26

helixbass force-pushed the ast-js-compatible-line-numbers branch from 771ea6e to 2c9cc26 Compare December 27, 2019 22:08

GeoffreyBooth closed this Dec 30, 2019

helixbass mentioned this pull request Dec 30, 2019

WIP: Output JS-compatible line numbers #5277

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Output JS-compatible line numbers #5264

WIP: Output JS-compatible line numbers #5264

helixbass commented Dec 17, 2019 •

edited

Loading

helixbass Dec 17, 2019

GeoffreyBooth commented Dec 17, 2019

helixbass commented Dec 30, 2019

		@@ -1314,6 +1314,8 @@ OPERATOR = /// ^ (

		WHITESPACE = /^[^\n\S]+/

		exports.LINEBREAK = LINEBREAK = /\r\n\|[\r\n\u2028\u2029]/ug

WIP: Output JS-compatible line numbers #5264

WIP: Output JS-compatible line numbers #5264

Conversation

helixbass commented Dec 17, 2019 • edited Loading

helixbass Dec 17, 2019

Choose a reason for hiding this comment

GeoffreyBooth commented Dec 17, 2019

helixbass commented Dec 30, 2019

helixbass commented Dec 17, 2019 •

edited

Loading