Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tooling] Rewrite generate_difference_report(). #678

Merged
merged 2 commits into from
Sep 19, 2018

Conversation

LebedevRI
Copy link
Collaborator

@LebedevRI LebedevRI commented Sep 16, 2018

My knowledge of python is not great, so this is kinda horrible.

Two things:

  1. If there were repetitions, for the RHS (i.e. the new value) we were always using the first repetition,
    which naturally results in incorrect change reports for the second and following repetitions.
    And what is even worse, that completely broke U test. :(
  2. A better support for different repetition count for U test was missing.
    It's important if we are to be able to report 'iteration as repetition',
    since it is rather likely that the iteration count will mismatch.

Now, the rough idea on how this is implemented now. I think this is the right solution.

  1. Get all benchmark names (in order) from the lhs benchmark.
  2. While preserving the order, keep the unique names
  3. Get all benchmark names (in order) from the rhs benchmark.
  4. While preserving the order, keep the unique names
  5. Intersect 2. and 4., get the list of unique benchmark names that exist on both sides.
  6. Now, we want to group (partition) all the benchmarks with the same name.
    BM_FOO:
        [lhs]: BM_FOO/repetition0 BM_FOO/repetition1
        [rhs]: BM_FOO/repetition0 BM_FOO/repetition1 BM_FOO/repetition2
    ...
    
    We also drop mismatches in time_unit here.
    (whose bright idea was it to store arbitrarily scaled timers in json ?! )
  7. Iterate for each partition
    7.1. Conditionally, diff the overlapping repetitions (the count of repetitions may be different.)
    7.2. Conditionally, do the U test:
    7.2.1. Get all the values of "real_time" field from the lhs benchmark
    7.2.2. Get all the values of "cpu_time" field from the lhs benchmark
    7.2.3. Get all the values of "real_time" field from the rhs benchmark
    7.2.4. Get all the values of "cpu_time" field from the rhs benchmark
    NOTE: the repetition count may be different, but we want all the values!
    7.2.5. Do the rest of the u test stuff
    7.2.6. Print u test
  8. ???
  9. PROFIT!

Fixes #677

@coveralls
Copy link

coveralls commented Sep 16, 2018

Coverage Status

Coverage remained the same at 89.022% when pulling d2e9a41 on LebedevRI:tooling-unbreak-repetitions into a5e9c06 on google:master.

@AppVeyorBot
Copy link

Build benchmark 1440 failed (commit 5f31abde8c by @LebedevRI)

tools/compare.py Outdated
dest='display_aggregates_only',
action="store_true",
help="If there are repetitions, by default, we display everything - the"
" actual runs, and the aggregates computed. Sometimes, it is "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please put the space before 'actual' on the previous line.

@LebedevRI LebedevRI force-pushed the tooling-unbreak-repetitions branch from 996adfc to 6a3fc63 Compare September 17, 2018 09:02
@LebedevRI LebedevRI changed the title [Do Not Merge][Tooling] Rewrite generate_difference_report(). [Tooling] Rewrite generate_difference_report(). Sep 17, 2018
@AppVeyorBot
Copy link

Build benchmark 1443 failed (commit 24bab0e192 by @LebedevRI)

@LebedevRI
Copy link
Collaborator Author

Further thoughts here? Moar tests?

@LebedevRI
Copy link
Collaborator Author

Alright, thank you for the review!

@LebedevRI LebedevRI merged commit aad33aa into google:master Sep 19, 2018
@LebedevRI LebedevRI deleted the tooling-unbreak-repetitions branch September 19, 2018 12:59
JBakamovic pushed a commit to JBakamovic/benchmark that referenced this pull request Dec 6, 2018
My knowledge of python is not great, so this is kinda horrible.

Two things:
1. If there were repetitions, for the RHS (i.e. the new value) we were always using the first repetition,
    which naturally results in incorrect change reports for the second and following repetitions.
    And what is even worse, that completely broke U test. :(
2. A better support for different repetition count for U test was missing.
    It's important if we are to be able to report 'iteration as repetition',
    since it is rather likely that the iteration count will mismatch.

Now, the rough idea on how this is implemented now. I think this is the right solution.
1. Get all benchmark names (in order) from the lhs benchmark.
2. While preserving the order, keep the unique names
3. Get all benchmark names (in order) from the rhs benchmark.
4. While preserving the order, keep the unique names
5. Intersect `2.` and `4.`, get the list of unique benchmark names that exist on both sides.
6. Now, we want to group (partition) all the benchmarks with the same name.
   ```
   BM_FOO:
       [lhs]: BM_FOO/repetition0 BM_FOO/repetition1
       [rhs]: BM_FOO/repetition0 BM_FOO/repetition1 BM_FOO/repetition2
   ...
   ```
   We also drop mismatches in `time_unit` here.
   _(whose bright idea was it to store arbitrarily scaled timers in json **?!** )_
7. Iterate for each partition
7.1. Conditionally, diff the overlapping repetitions (the count of repetitions may be different.)
7.2. Conditionally, do the U test:
7.2.1. Get **all** the values of `"real_time"` field from the lhs benchmark
7.2.2. Get **all** the values of `"cpu_time"` field from the lhs benchmark
7.2.3. Get **all** the values of `"real_time"` field from the rhs benchmark
7.2.4. Get **all** the values of `"cpu_time"` field from the rhs benchmark
          NOTE: the repetition count may be different, but we want *all* the values!
7.2.5. Do the rest of the u test stuff
7.2.6. Print u test
8. ???
9. **PROFIT**!

Fixes google#677
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants