Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Performance/DoubleStartEndWith cop #2590

Merged
merged 1 commit into from
Jan 7, 2016

Conversation

DNNX
Copy link
Contributor

@DNNX DNNX commented Jan 6, 2016

This is a continuation of #2535 .

This cop checks for double #start_with? or #end_with? calls
separated by ||. In some cases such calls can be replaced
with an single #start_with?/#end_with? call.

  # bad
  str.start_with?("a") || str.start_with?(Some::CONST)
  str.start_with?("a", "b") || str.start_with?("c")
  var1 = ...
  var2 = ...
  str.end_with?(var1) || str.end_with?(var2)

  # good
  str.start_with?("a", Some::CONST)
  str.start_with?("a", "b", "c")
  var1 = ...
  var2 = ...
  str.end_with?(var1, var2)

The corner cases are when arguments to start_with? have side effects,
or when the receiver itself has side effects, or when the receiver
is not a string at all, or when someone monkey-patched String#start_with?

Benchmark results are not too convincing because of a big variance, but they show that the single-call approach is at least not slower that the two-call approach.

prefix1 = "lib/"
prefix2 = "test/"
{
  'First prefix matches' => 'lib/file.rb',
  'Second prefix matches' => 'test/file.rb',
  'None matches' => 'app/models/file.rb'
}.each do |description, str|
  Benchmark.ips do |x|
    puts "==== #{description} ===="
    x.report('two calls') { str.start_with?(prefix1) || str.start_with?(prefix2) }
    x.report('one call')  { str.start_with?(prefix1, prefix2) }
    x.compare!
  end
end
==== First prefix matches ====
Calculating -------------------------------------
           two calls    31.010k i/100ms
            one call    35.180k i/100ms
-------------------------------------------------
           two calls      2.035M (±14.6%) i/s -      9.644M
            one call      2.039M (±15.9%) i/s -      9.569M

Comparison:
            one call:  2038728.5 i/s
           two calls:  2034701.2 i/s - 1.00x slower

==== Second prefix matches ====
Calculating -------------------------------------
           two calls    31.139k i/100ms
            one call    34.518k i/100ms
-------------------------------------------------
           two calls      1.515M (±20.7%) i/s -      6.944M
            one call      1.709M (±19.5%) i/s -      7.836M

Comparison:
            one call:  1708570.7 i/s
           two calls:  1514975.3 i/s - 1.13x slower

==== None matches ====
Calculating -------------------------------------
           two calls    33.038k i/100ms
            one call    36.789k i/100ms
-------------------------------------------------
           two calls      1.471M (±18.9%) i/s -      6.773M
            one call      1.778M (±19.9%) i/s -      8.130M

Comparison:
            one call:  1778026.2 i/s
           two calls:  1471409.3 i/s - 1.21x slower

Wild life occurrences: ManageIQ/manageiq#5558, rails/rails#22374, the aforementioned #2535.

@alexdowad
Copy link
Contributor

Nice!! I was thinking of doing this myself, good to see that someone else did it first!

# str.end_with?(var1, var2)
class DoubleStartEndWith < Cop
MSG = 'Use `str.%{method}(x, ..., y, ...)` ' \
'instead of `str.%{method}(x, ...) || str.%{method}(y, ...)`.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than showing x, ... and y, ..., it would be nicer if the message showed the actual arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll fix it.

@alexdowad
Copy link
Contributor

One other way this could be improved -- think of things like this:

# this will parse as (or (or something str.start_with?('a')) str.start_with?('b'))...
# in other words the 2 nodes you are interested in will not be nested under the same 'or' node
something || str.start_with?('a') || str.start_with?('b')

# another example:
str.start_with?('a') || foo || bar || str.start_with?('b')

Evaluation is from left to right, so you have to make sure that there is no node with unknown side effects between the 2 matching calls.

@DNNX
Copy link
Contributor Author

DNNX commented Jan 6, 2016

something || str.start_with?('a') || str.start_with?('b') - yea, I was thinking about this, but found it kind of complex to implement so I decided to go with more simple implementation which covers most real-life cases (in my experience). I also saw !str.start_with?(a) && !str.start_with?(b) in one open-source project. I can also be simplified, but again, I decided not to go this far. Do you think it should be done in this PR?

Another thing which can potentially be done is we can loosen the requirement of the first argument to the first start_with? call to be pure. As long as all other arguments are pure, the first one can have side effects - the semantics will stay the same.

@alexdowad
Copy link
Contributor

Do you think it should be done in this PR?

It's up to you. I am just suggesting ways to make the PR better. If you don't want to go that far in this PR, and the maintainers want to accept it as is, that is fine. It is already an improvement over what we have.

@DNNX
Copy link
Contributor Author

DNNX commented Jan 6, 2016

Here is a little summary:

  • Implemented more intelligent matcher as per @alexdowad's suggestion
  • Improved offense message format. No more abstract x, ... y, ...; the message now includes the actual code
  • Dropped the requirement for the arguments of the first start_with? call to be pure
  • something || str.start_with?('a') || str.start_with?('b') - not going to do this in this PR if you guys don't mind
  • str.start_with?('a') || foo || bar || str.start_with?('b') - will leave it as well
  • !str.start_with?(a) && !str.start_with?(b) - not going to do this either (in this PR at least)

If you guys are ok with this, please let me know. I will squash the commits and add an entry to the change log then.

@DNNX DNNX force-pushed the double-start-end-with branch from 9ab0eaa to 1bb60bb Compare January 7, 2016 11:30
@DNNX
Copy link
Contributor Author

DNNX commented Jan 7, 2016

UPD: This PR is ready to merge now, if everyone is ok with the changes.

@jonas054
Copy link
Collaborator

jonas054 commented Jan 7, 2016

👍 The changes look good.

'instead of `%{original_code}`.'

def on_or(node)
receiver, method, args1, args2 = two_start_end_with_calls(node)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args1 and args2 are not very descriptive variable names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@DNNX DNNX force-pushed the double-start-end-with branch from 1bb60bb to 2f42e87 Compare January 7, 2016 14:50
context 'two #start_with? calls' do
context 'with the same receiver' do
context 'all parameters of the second call are pure' do
let(:source) { 'x.start_with?(a, b) || x.start_with?("c", D)' }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone who does a bit of FP on the side, I think the usage of the term "pure" here is confusing. Basically you don' t want a function call as an argument, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No -- "pure" means "pure", as in "has no side effects when evaluated". See the definition in Node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, strictly speaking we can't allow local variables and constants too. There might be some pathalogical cases when .to_str have side effects:

class Evil
  def initialize
    @state = 0
  end

  def to_str
    @state = (@state + 1) % 4
    @state.to_s
  end
end

EVIL = Evil.new

p "1".start_with?(EVIL, EVIL) || "1".start_with?(EVIL)
# => true
p "1".start_with?(EVIL, EVIL, EVIL)
# => false

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pure doesn't mean "has not side effect when evaluated". It means that the same input always maps to the same output, that the only input is parameters (no global, instance vars, etc) and that they are no side effects. I'll check the definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. @alexdowad what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Seems equally confusing with pure?, to be honest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the nodes you're checking seem to me like value nodes (and the complex nodes also reduce down to values). In general there are no method invocation checks and I really think that many people associate the term pure with functions and methods. I'm open to other naming suggestions, but I certainly think we can do better than the current name. I was recently reminded that I didn't like this name when I originally saw it, but the commit was a small part of a huge PR and I was too tired to go into relatively small details. Anyways, guess this is not related to the current PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is #no_side_effects? too wordy?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main issue is that such names imply we're dealing with something executable. I'd expect this to apply to send nodes, not to nodes in general. Naming is so hard...

@bbatsov
Copy link
Collaborator

bbatsov commented Jan 7, 2016

Rebase this and I'll have it merged.

This cop checks for double `#start_with?` or `#end_with?` calls
separated by `||`. In some cases such calls can be replaced
with an single `#start_with?`/`#end_with?` call.

```ruby
  # bad
  str.start_with?("a") || str.start_with?(Some::CONST)
  str.start_with?("a", "b") || str.start_with?("c")
  var1 = ...
  var2 = ...
  str.end_with?(var1) || str.end_with?(var2)

  # good
  str.start_with?("a", Some::CONST) || str.start_with?(Some::CONST)
  str.start_with?("a", "b", "c")
  var1 = ...
  var2 = ...
  str.end_with?(var1, var2)
```

The corner cases are when arguments of the second `start_with?` call
 have side effects, or when the receiver itself has side effects,
or when the receiver is not a string at all, or when someone
monkey-patched `String#start_with?`
@DNNX DNNX force-pushed the double-start-end-with branch from 2f42e87 to 494a434 Compare January 7, 2016 18:34
@DNNX
Copy link
Contributor Author

DNNX commented Jan 7, 2016

Rebased it. CI green, no conflicts (yet).

bbatsov added a commit that referenced this pull request Jan 7, 2016
New Performance/DoubleStartEndWith cop
@bbatsov bbatsov merged commit fc588a3 into rubocop:master Jan 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants