WIP: Output JS-compatible line numbers #5264
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@GeoffreyBooth this is an interesting one I came across while running the ESLint plugin against the Coffeescript repo source code:
It was crashing running ESLint against
test/regex.coffee
because of the literal U+2028/U+2029 (Unicode linebreak) characters in that fileLooking into it, it turns out our location data/line numbers aren't exactly JS-compatible - per eg here, JS considers newline (
\n
), carriage return (\r
), carriage return + newline (\r\n
), U+2028 and U+2029 to be line terminatorsWhereas we strip carriage returns and then just consider newlines to be line terminators
So what was happening is that in a file with "unusual" line terminators according to JS (eg the U+2028 and U+2029 in
test/regex.coffee
), ESLint was doing the JS version of counting linebreaks and then getting confused when a certain line didn't have the number of columns reported by our location dataSo thus far this PR swaps in the JS-compatible regex for detecting linebreaks in the obvious places. Which did get ESLint working against
test/regex.coffee
But I'm probably inclined to not try and get this merged before initial release, as (a) it should only cause problems in files where someone's got "fake linebreak" characters floating around and (b) I think it's worth thinking through more fully
For one thing, I think that if someone had a
\r
in their source code that wasn't "attached" to a following\n
, it'd need to be accounted for - since we strip out\r
's, I think we'd need to go a step further than thelocationDataCompensations
(that we added into the lexer to get accurate location data despite things like\r
's having been stripped out) and do something like additionally "remember" the source offsets of any "unattached"\r
's and then take those into account when generating location dataBut ultimately it seems like it'd be a good idea to be compatible with JS's definition of linebreaks in our location data?
Based on
ast-splat-param-location-data
, here is just the diff against that branch