Skip to content
This repository has been archived by the owner on Nov 30, 2024. It is now read-only.

Support composable matchers #393

Merged
merged 57 commits into from
Jan 2, 2014
Merged

Conversation

myronmarston
Copy link
Member

This is still a bit WIP but it's much further along than the last WIP PR and I wanted to open the PR and start the code review process. I applied the things discussed with @xaviershay and @JonRowe from #388.

Still TODO:

  • The rbx build is failing for some reason. Figure out why.
  • Finish defining all the matcher aliases.
  • Figure out how to easily document the aliases in the YARD docs so it makes it clear they are aliases.
  • Update the README with examples showing how to compose matchers
  • Update the YARD doc comments above the RSpec::Matchers module that discusses the matcher protocol so that it mentions how the composing works (e.g. the use of ===).
  • Add a cuke demonstrating this.
  • Given that many matchers do not support passing matchers as args (e.g. because the semantics would not make sense), should we explicitly document which matchers support it? (Note that all matchers will support being passed as an arg to another matcher -- it's just that some matchers aren't designed to receive a matcher as an arg).
  • Consider updating the match matcher to be composable.
  • Figure out a better algorithm for the match_array matcher to get the pending spec to pass. Edit: I think a stable matching algorithm may work here.
  • Ensure diff output, if used, looks good with matchers (example: failing match(nested_structure))

The last item in particular is stumping me -- I need help on that one! See the pending spec I left in match_array_spec.rb for what I'm talking about.

Even though I have a lot of TODOs left, feedback is welcome whenever anyone wants to give it. I'll be sure to comment here when I everything I'm planning to do is complete.

/cc @xaviershay @JonRowe @soulcutter @samphippen @alindeman

Closes #280.

@xaviershay
Copy link
Member

For the match_array problem, one option would be to brute force permutation and pick the "best" match.

https://coderpad.io/078048

Sorry don't have enough time over the next week to flesh out the idea.

@myronmarston
Copy link
Member Author

Thanks, @xaviershay. I pushed a branch with that implementation (brute-force-match-array) -- see 78bdd38. Unfortunately, it's ridiculously slow, because it's an O(N!) algorithm...which is is pretty much the worse time complexity you can get. Check out this growth (from def7863):

    Top 6 slowest examples (61.67 seconds, 100.0% of total time):
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 10 items
        56.41 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 9 items
        4.66 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 8 items
        0.53014 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 7 items
        0.05838 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 6 items
        0.00974 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 5 items
        0.00212 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269

Compared to what we had before:

    Top 6 slowest examples (0.00324 seconds, 100.0% of total time):
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 10 items
        0.00067 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 9 items
        0.00062 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 8 items
        0.00056 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 6 items
        0.00054 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 7 items
        0.00047 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269
      Composing `match_array` with other matchers expect(...).to match_array([matcher, matcher]) works with arrays of 5 items
        0.00038 seconds ./spec/rspec/matchers/built_in/match_array_spec.rb:269

A minute to compare two arrays of 10 elements each is clearly not going to fly. I think I'm going to go back and see if I can get the stable marriage algorithm to work, unless you have any better ideas.

On a side note, even what we had before is slower than I would like: 6 ms to evaluate one matcher expression is going to be the primary bottle neck in a true unit test (most of my unit tests average ~2ms or so). I think that's simply the nature of having to do all the comparisons for match_array to be as flexible as it is, so I'm not sure we can do anything about it, but we should probably add note to the docs, telling people that it's slow compared to expect(array.sort).to eq(%w[ a b c d e ]), so in fast unit tests we recommend using that instead of people can.

@myronmarston
Copy link
Member Author

Actually, I realized I just read that wrong: I said it was taking 6 ms to evaluate the matcher expression but it looks like it's actually .6 ms...which isn't so bad for the awesome flexibility match_array gives you.

@myronmarston
Copy link
Member Author

And, FWIW, here's an implementation of the stable marriage algorithm in ruby:

http://rosettacode.org/wiki/Stable_marriage_problem#Ruby

@xaviershay
Copy link
Member

I expected slow, but not that slow.

@myronmarston
Copy link
Member Author

I expected slow, but not that slow.

Yep, it was definitely slower than I expected. Then again, factorial is the fastest growing function I can think of...

@myronmarston
Copy link
Member Author

So I've made a bit of progress on the matching algorithm. I eventually decided the stable marriage thing was a dead end (I couldn't find a way to model our problem in terms of the requirements of the stable marriage algorithm), but I've come up with an alternate approach that I think will perform quite well:

  1. Compare every expected item against every actual item to get, for a each item, a list of matches from the other list.
  2. Put all all expected items in a remaining_expected_items list, and all actual items in a remaining_actual_items list. This represents the items that still have to be dealt with: they must either be paired, or put in the final extra_items list or put in the final missing_items list.
  3. Resolve any definitive results:
    • Any expected items for which there are no matches get moved from remaining_expected_items to extra_items.
    • Any actual items for which there are no matches get moved from remaining_actual_items to missing_items.
    • Any expected/actual item pairs that are reciprocally matched only to each other get removed from remaining_expected_items and remaining_actual_items.
  4. From here, do a brute force trial & error/backtracking approach:
    • Take any expected item. Choose one of its actual item matches, and try pairing them. Remove them from remaining_expected_items and remaining_actual_items, and update the matches lists for each remaining item to no longer include these paired items (since we are eliminating them as options for any other items to pair with).
    • Resolve any definitive results that come out of this trial, as explained in support diffing #3.
    • Iterate.
  5. As we do this, we keep track of the best solution seen so far. This will give us the extra_items and missing_items used in the failure message. If we get a solution that pairs everything up, we can abort early and consider the matcher to pass.

I feel pretty good about this algorithm, for a few reasons:

  • It's easy to reason about: I don't have a formal proof of its correctness, but it's pretty easy to see that as long as it's implemented correctly, it should always work properly.
  • Besides giving us a pass/fail status, this also gives us the extra_items and missing_items needed for the failure message.
  • It performs quite well (even though it boils down to a brute force approach in the end!). In the common case of every expected item matching against 1 actual item, the brute force part won't even come into play. It'll perform the same as it does now (which is basically O(N^2)). The number of brute force iterations required is a function of how many expected items match against multiple actual items (rather than being a function of array length, as the first attempt was).

I've got some of the plumbing for this solution already in place in a local branch. The part I'm struggling a bit with is the brute force iteration: we need to have a way to try fixing a pair to see where that goes, w/o destroying what we have so far, so that we can later pop the brute force stack and try a different pairing. I think that doing this easily basically comes down to having good abstractions, but I'm having a hard time naming things because it's already so abstract. (Plus, I've had a lot of distractions every time I've tried to work on this).

@xaviershay -- what do you think of that solution? Would you want to pair on it some time?

@xaviershay
Copy link
Member

Algorithm sounds good. I'm not going to be able to pair on it before the new year, have quite a bit of travel and family stuff over the next week.

@myronmarston
Copy link
Member Author

OK, I've implemented the algorithm discussed above. It's working, and it's much faster than the first brute force approach that @xaviershay and I tried. However, it still has some perf issues -- not based on how big the arrays are but based on how many duplicates/multi-matches there are. Also, I'm not super happy with the algorithm implementation -- it could definitely use some refactoring.

Include it in `failure_message` and
`failure_message_when_negated` so the matchers
don't have to override all 3 if they are just
customization the description part of it.
We can leverage the instance state of the matcher
object.
{:a => 1, :b => 2}

...reads much better than:

{:a=>1, :b=>2}
- Leverage instance variable state.
- Leverage the description/failure messages
  provided by BaseMatcher.
- Remove ternaries.
@@ -39,6 +41,98 @@ def or(matcher)
def ===(value)
matches?(value)
end

private unless defined?(::YARD)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't seen this before ... because YARD doesn't include private methods? Does it include protected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➜  rspec-mocks git:(master) yard doc --help | grep private
        --private                    Show private methods. (default hides private)
        --no-private                 Hide objects with @private tag

Private methods are hidden in the docs by default. This makes a lot of sense: private methods are generally not part of your public API.

Here I wanted these methods to be private (as they are only ever intended to be called from within a matcher on self) but since this is a mixin intended for users to use in their custom matchers, I wanted these methods doc'd. This definitely felt like a hack.

Do you think it's better to just make the methods public? I'm on the fence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making them ruby public is fine - we've docced them as api private

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, they are doc'd as api public. The problem here is that there are two notions of visibility:

  • Ruby's notion: private vs protected vs public. This affects whether you can call a method with an explicit receiver or not.
  • API notion: private vs public: Things docs as api private are subject to change in minor or patch releases with no deprecation warnings or notices. They are "off limits" for end users to use directly if they don't want their code breaking when they upgrade. API public means we are committed to not breaking it in a minor or patch release as per SemVer.

Here we've got methods we'd like to be ruby private but API public.

@xaviershay
Copy link
Member

it suggests that it would fail if the the yielded arg is not a string, but it doesn't check that

I'm not too worried about this. In my experience most people don't think about explicitly checking types of their objects, and would expect this to pass with a string-like object. (Though it is a bit weird in ruby, string is an array-like object.)

Should we have rspec-expectations overwrite the rspec-mocks argument matchers?

Is there existing code in either library that soft depends on the other like this? If yes, we should keep that direction the same. If no, does pushing some part of the code down into rspec-support make sense?

@myronmarston
Copy link
Member Author

Is there existing code in either library that soft depends on the other like this?

rspec-mocks defines a couple matchers (e.g. have_received), but it's a very soft dependency: it simply implements the same protocol, and it can work on its own w/o rspec-expectations.

If no, does pushing some part of the code down into rspec-support make sense?

Maybe -- I'll think more about that.

@xaviershay
Copy link
Member

I'm ok with a soft-dependency, just as long as we don't have one both ways :)

- The `RSpec::Matchers` API was getting very large with
  all the aliases, which makes it harder to keep all in
  your head.
- It's better to start with fewer and then add more
  in future releases as there is demand. Once they are
  there, we can't remove them from a future minor/patch
  release because of server.
- `RSpec::Matchers.alias_matcher` is provided so users can
  define new aliases.
It was going to be very hard to document how this worked,
and what phrasing forms are supported. Instead, we simply allow
users to define their own aliases using `alias_matcher`.
We want to keep the solution with the fewest unmatched
items, not the most unmatched items.
@myronmarston
Copy link
Member Author

@xaviershay -- I think the perf is still unsatisfactory -- see 09734a7. I'm going to try to see if I can optimize it a bit.

There are still perf problems :(.
When recursing during the brute force depth-first
search, pass along only the indeterminate indexes,
so that it runs on a smaller subproblem. For this to
work, we had to switch to using a hash, so that it
can handle index gaps.

This provides mixed results: it makes some things faster,
some things slower. I think it's an improvement overall.
It's surprising how much difference this makes!
This is much, much faster than the full matching
algorithm, and in common cases (e.g. arrays of numbers
or strings) works just fine.  See the benchmarks
for how much of a difference this makes!
@myronmarston
Copy link
Member Author

@xaviershay -- I found some fairly easy optimizations to make that make the common cases blazing fast, and make the slower cases about 2x faster. It's still slower than I would like it for those slower cases, but I'm at a point where I think I'm OK with it for now and can move on to something else. Let me know what you think of my optimizations -- I may have missed something or there may be further low-hanging fruit that these optimizations suggest.

@xaviershay
Copy link
Member

LGTM

myronmarston added a commit that referenced this pull request Jan 2, 2014
@myronmarston myronmarston merged commit 7736337 into master Jan 2, 2014
@myronmarston myronmarston deleted the support-composable-matchers branch January 2, 2014 22:58
@xaviershay
Copy link
Member

So we can ship 3 now right? :P

@myronmarston
Copy link
Member Author

So we can ship 3 now right? :P

Depends on your definition of "now" :).

@JonRowe
Copy link
Member

JonRowe commented Jan 2, 2014

I'm going to work on the formatter stuff more this weekend, when thats done then we can ship RSpec 3 :P

@myronmarston
Copy link
Member Author

I'm going to work on the formatter stuff more this weekend, when thats done then we can ship RSpec 3 :P

Well, we can ship beta2. Then I plan to do my core/mocks/expectations code audit and see what else (if anything) we want before RC and final.

@JonRowe
Copy link
Member

JonRowe commented Jan 2, 2014

Aye, mostly was just reminding peeps more stuff existed ;)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make all matchers composable for 3.0
3 participants