Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option AllowHeredoc to Metrics/LineLength #2479

Merged
merged 1 commit into from
Dec 22, 2015

Conversation

fphilipe
Copy link
Contributor

@fphilipe fphilipe commented Dec 7, 2015

This option can either be true or a list of heredoc delimiters. When set to true, long lines in all heredocs are allowed. When set to a list, long lines are only allowed in heredocs delimited by the specified delimiters.

Closes #1407

/cc @jfelchner

@fphilipe fphilipe force-pushed the line-length-allow-heredoc branch from 726a493 to e10a903 Compare December 7, 2015 07:27
@alexdowad
Copy link
Contributor

This is nice! But I think the whitelist feature is not particularly useful.

Personally, I would prefer if Metrics/LineLength simply allowed long lines in heredocs all the time, without the need for an extra config parameter.

@fphilipe fphilipe force-pushed the line-length-allow-heredoc branch 6 times, most recently from 59f7b2b to 1e21798 Compare December 7, 2015 09:27
@fphilipe
Copy link
Contributor Author

fphilipe commented Dec 7, 2015

@alexdowad You can just set it to true. I guess we can also discuss whether the default value of this option should be true.

@alexdowad
Copy link
Contributor

You can just set it to true

Yes, certainly; but in my opinion, it is better not to add gratuitous configuration "knobs". It just complicates things.

But if the config parameter is there, my vote is that it should definitely be set to true by default.

@jfelchner
Copy link
Contributor

Usage-wise I love it, but it's ultimately up to @bbatsov. The reason I like this is because:

a) it can be set to true in the default configuration so that users don't have to set what would probably be a standard default
b) it makes this ripe for a refactoring into a module that can be mixed into any other cop where we'd like to exclude HEREDOCs

heredocs = []
iter = processed_source.tokens.each
loop do
token = iter.next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be written as

processed_source.tokens.each_with_object([]) do |token, heredocs|
  heredocs.concat(collect_heredoc(iter)) if token.type == :tSTRING_BEG
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I think this would run into issues with the collect_heredoc method. It does feel like there may be an opportunity to refactor here. Passing iter around feels a bit weird to me. These methods have a lot of similarities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrosenblum I actually first tried a similar approach to what you're suggesting. Turns out that makes it really hard to parse. The advantage of the iterator is that you consume it, thus progressing in the token list. This is much easier with nested heredocs. See e.g. this snippet:

$ ruby-parse -E -e "
foo(<<ONE, <<TWO)
Hello #{<<NESTED.upcase}
World
NESTED
ONE
What's your name?
TWO
"
foo(<<ONE, <<TWO)
^~~ tIDENTIFIER "foo"                           expr_cmdarg  [0 <= cond] [0 <= cmdarg]
foo(<<ONE, <<TWO)
   ^ tLPAREN2 "("                               expr_beg     [0 <= cond] [0 <= cmdarg]
foo(<<ONE, <<TWO)
    ^~~~~ tSTRING_BEG "<<\""                    expr_end     [0 <= cond] [0 <= cmdarg]
Hello #{<<NESTED.upcase}
^~~~~~ tSTRING_CONTENT "Hello "                 expr_end     [0 <= cond] [0 <= cmdarg]
Hello #{<<NESTED.upcase}
      ^~ tSTRING_DBEG "\#{"                     expr_end     [0 <= cond] [0 <= cmdarg]
Hello #{<<NESTED.upcase}
        ^~~~~~~~ tSTRING_BEG "<<\""             expr_end     [0 <= cond] [0 <= cmdarg]
World
^ tSTRING_CONTENT "World\n"                     expr_end     [0 <= cond] [0 <= cmdarg]
NESTED
^~~~~~ tSTRING_END "NESTED"                     expr_end     [0 <= cond] [0 <= cmdarg]
Hello #{<<NESTED.upcase}
                ^ tDOT "."                      expr_dot     [0 <= cond] [0 <= cmdarg]
Hello #{<<NESTED.upcase}
                 ^~~~~~ tIDENTIFIER "upcase"    expr_arg     [0 <= cond] [0 <= cmdarg]
Hello #{<<NESTED.upcase}
                       ^ tSTRING_DEND "}"       interp_string [0 <= cond] [0 <= cmdarg]
Hello #{<<NESTED.upcase}
                        ^ tSTRING_CONTENT "\n"  expr_end     [0 <= cond] [0 <= cmdarg]
World
^ tSTRING_CONTENT "World\n"                     expr_end     [0 <= cond] [0 <= cmdarg]
NESTED
^ tSTRING_CONTENT "NESTED\n"                    expr_end     [0 <= cond] [0 <= cmdarg]
ONE
^~~ tSTRING_END "ONE"                           expr_end     [0 <= cond] [0 <= cmdarg]
foo(<<ONE, <<TWO)
         ^ tCOMMA ","                           expr_beg     [0 <= cond] [0 <= cmdarg]
foo(<<ONE, <<TWO)
           ^~~~~ tSTRING_BEG "<<\""             expr_end     [0 <= cond] [0 <= cmdarg]
What's your name?
^ tSTRING_CONTENT "What's your name?\n"         expr_end     [0 <= cond] [0 <= cmdarg]
TWO
^~~ tSTRING_END "TWO"                           expr_end     [0 <= cond] [0 <= cmdarg]
foo(<<ONE, <<TWO)
                ^ tRPAREN ")"                   expr_end     [0 <= cond] [0 <= cmdarg]
foo(<<ONE, <<TWO)
                 ^ tNL nil                      line_begin   [0 <= cond] [0 <= cmdarg]

^ false "$eof"                                  line_begin   [0 <= cond] [0 <= cmdarg]
(send nil :foo
  (dstr
    (str "Hello ")
    (begin
      (send
        (str "World\n") :upcase))
    (str "\n")
    (str "World\n")
    (str "NESTED\n"))
  (str "What's your name?\n"))

Let's cut down the noise here and just look at the tokens:

order token text line
1 tIDENTIFIER foo 1
2 tLPAREN2 ( 1
3 tSTRING_BEG <<\" 1
4 tSTRING_CONTENT Hello 2
5 tSTRING_DBEG \#{ 2
6 tSTRING_BEG <<\" 2
7 tSTRING_CONTENT World\n 3
8 tSTRING_END NESTED 4
9 tDOT . 2
10 tIDENTIFIER upcase 2
11 tSTRING_DEND } 2
12 tSTRING_CONTENT \n 2
13 tSTRING_CONTENT World\n 3
14 tSTRING_CONTENT NESTED\n 4
15 tSTRING_END ONE 5
16 tCOMMA , 1
17 tSTRING_BEG <<\" 1
18 tSTRING_CONTENT What's your name?\n 6
19 tSTRING_END TWO 7
20 tRPAREN ) 1

So every time you encounter a tSTRING_BEG, the next token will be the first line of the heredoc body. You do that until you reach tSTRING_END, which contains the delimiter of the heredoc. The line before that is the end of the heredoc's body. Inside the heredoc you might encounter a further tSTRING_BEG (token 5), i.e. a nested heredoc. There you want to consume the tokens until that nested one reaches its end (token 8) before going back to the parent heredoc.

I feel like a solution using #each and the like would be more complicated since you'd have to pass state around, e.g. how many lines to skip due to a nested heredoc. With the iterator the state is implicit in the iterator. But I'm happy to be proven wrong 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fphilipe the parser gem already creates a Parser::Source::Range object for each AST node which can tell you the beginning and ending lines/columns where the source code for that node appears. Why don't you just find the AST nodes for the heredocs, note the lines/columns where they begin and end, and use that to check whether the end of a long line falls within a heredoc or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexdowad Thanks! I had looked into the AST, but I overlooked that it actually keeps track of the heredoc range. I've pushed another commit, which makes the code significantly easier to follow.

@fphilipe
Copy link
Contributor Author

Thanks to @alexdowad's suggestion, I'm now extracting the information from the AST, which makes the code simpler. I've added a commit on top and haven't rebased, just so one can see the difference. Let me know if I should rebase before merging.

One more thing I'm not sure about is where to store the extracted heredocs. Right now I just create this heredoc variable in #investigate (line). Would it be OK to have that as an ivar or is the instance reused across many files?

@fphilipe fphilipe force-pushed the line-length-allow-heredoc branch from 7dfb837 to b1b341d Compare December 10, 2015 10:45
@rrosenblum
Copy link
Contributor

The new code is much cleaner. Good job. Thank you @alexdowad and @fphilipe.

@jfelchner
Copy link
Contributor

@fphilipe so happy to see this. 😄 Thank you!

allowed_heredoc
end

def allowed_heredoc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need two methods? Isn't allow_heredoc? enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you set the option to a list of allowed heredoc delimiters, you need to check whether the heredoc containing the long line is whitelisted. Thus this method is required. One can argue about #allow_heredoc? above, which actually just returns whether the config is not false. I added #allow_heredoc? so that in #investigate it looks balanced with #allow_uri?. Let me know what the preference is here.

@jonas054
Copy link
Collaborator

You should add LineLength: AllowHeredoc in config/default.yml.

@fphilipe
Copy link
Contributor Author

You should add LineLength: AllowHeredoc in config/default.yml.

@jonas054 With what value? true or false?

@jonas054
Copy link
Collaborator

I'm leaning towards true. 😄

@fphilipe fphilipe force-pushed the line-length-allow-heredoc branch from 0f21667 to 0a9f9c5 Compare December 11, 2015 13:32
@fphilipe
Copy link
Contributor Author

@jonas054 There you go 😊 This caused a bunch of # rubocop:disable Metrics/LineLength to become obsolete.

@bbatsov
Copy link
Collaborator

bbatsov commented Dec 21, 2015

Looks good. Squash all commits together and rebase on top of the current master.

@fphilipe fphilipe force-pushed the line-length-allow-heredoc branch from 0a9f9c5 to 2f9b896 Compare December 21, 2015 16:40
This option can either be `true`, `false`, or a list of heredoc
delimiters. When set to `true`, long lines in all heredocs are allowed.
When set to a list, long lines are only allowed in heredocs delimited by
the delimiters in that list.

The default value of this option is true.

Closes rubocop#1407
@fphilipe fphilipe force-pushed the line-length-allow-heredoc branch from 2f9b896 to 8918f9e Compare December 21, 2015 16:51
@fphilipe
Copy link
Contributor Author

@bbatsov Done!

bbatsov added a commit that referenced this pull request Dec 22, 2015
Add option AllowHeredoc to Metrics/LineLength
@bbatsov bbatsov merged commit ad41516 into rubocop:master Dec 22, 2015
@bbatsov
Copy link
Collaborator

bbatsov commented Dec 22, 2015

👍

@jfelchner
Copy link
Contributor

@fphilipe really happy about this. Thank you so much for looking at it and getting a solution knocked out. 😀

@alexdowad
Copy link
Contributor

Thanks @fphilipe.

@fphilipe
Copy link
Contributor Author

You're welcome! Happy to help 😄

@fphilipe fphilipe deleted the line-length-allow-heredoc branch December 22, 2015 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants