Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Style/IndentationWidth fails with BOM set by Vim #2703

Closed
fabioxgn opened this issue Jan 22, 2016 · 9 comments
Closed

Style/IndentationWidth fails with BOM set by Vim #2703

fabioxgn opened this issue Jan 22, 2016 · 9 comments
Assignees
Labels

Comments

@fabioxgn
Copy link

if you use :set bomb in vim to add BOM to an utf-8 encoded file, rubocop starts failing.

test.rb:2:2: C: Style/IndentationWidth: Use 2 (not 1) spaces for indentation. (https://github.com/bbatsov/ruby-style-guide#spaces-indentation)
  def method
 ^

Aparently it is detecting the BOM char as an extra space in the first line.

I've attached a file that reproduces this issue:

test.rb.zip

@jonas054
Copy link
Collaborator

Attaching the file was a nice idea. Thanks.

This is probably my fault, so I'll try to fix it.

@jonas054 jonas054 self-assigned this Jan 28, 2016
@jonas054 jonas054 added the bug label Jan 28, 2016
@jonas054
Copy link
Collaborator

@whitequark Is there any chance Parser::Source::Range#column could return 0 for a range that's on the first line of the buffer and is preceded only by a byte order mark?

@alexdowad
Copy link
Contributor

Is there any chance Parser::Source::Range#column could return 0 for a range that's on the first line of the buffer and is preceded only by a byte order mark?

Easiest way to do this would be to eat the BOM in Buffer#source=, after the call to reencode_string. #raw_source= should remain unchanged.

@whitequark
Copy link

The function of BOM is fundamentally the same as the encoding comment. We don't mangle the encoding comment, I don't see a reason to mangle BOM either.

@alexdowad
Copy link
Contributor

I don't see a reason to mangle BOM either.

If you don't want to mangle it, that's fine. However, one important difference is that the BOM is invisible when viewed in a text editor. So when people call range.column, they don't expect the BOM to "count". Intuitively, #column is not a byte offset, but an offset of distance from the left margin when text is displayed.

@whitequark
Copy link

#column returns the number of characters since the beginning of the line. (Yes, in retrospect the name is misleading, but this did not occur to me when I was writing it.) Due to presence of tabs, combining and zero-width characters in Unicode, this already has nothing to do with the column at which text is displayed, and there is no point in special-casing BOM. Moreover, special-casing BOM will confuse those tools which do treat the column in error message as the character number, e.g. Sublime Text.

What you want to display is the number of grapheme clusters since the beginning of the line. Use one of the multitude of gems that implement UCD lookups to calculate the number of grapheme clusters. This is not the job of Parser.

@alexdowad
Copy link
Contributor

What you want to display is grapheme clusters. Use one of the multitude of gems that implement UCD lookups to calculate the number of grapheme clusters. This is not the job of Parser.

Very well.

@alexdowad
Copy link
Contributor

@jonas054 Looks like you will have to special-case this in IndentationWidth.

@jonas054
Copy link
Collaborator

@alexdowad Fair enough. @whitequark Thanks for the thorough explanation.

bbatsov added a commit that referenced this issue Jan 30, 2016
[Fix #2703] Calculate column on first line when BOM is present
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants