Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Benchmark result depends on the order in which benchmarks were registrated #1469

Closed
bogdan-lab opened this issue Aug 11, 2022 · 4 comments

Comments

@bogdan-lab
Copy link

Describe the bug
I tried to measure different solution of the "merge from multiple sources" problem.
In order to do that I needed to prepare vector of sorted queues which I was going to merge.
I decided to put these queue generation code inside the measuring loop and simply created separate benchmark where I measure time required for queue generation only.
The problem is that in one corner case (1000 queues with 1 element in each) application showed that it is faster to generate data and merge it than ONLY generate.
Other cases, even 1000 queues with 2 elements in each showed adequate results - only generation of the data is faster than generation with its manipulation.
Anyway, in order to discard the possibility for compiler to optimize something I tried to move the function which generates data to the separate file, use benchmark::DoNotOptimize everywhere but result was still meaningless.

The result for the problematic case become adequate if I register benchmark which measures only data generation after at least one another benchmark.
This is, the order of benchmark registration by macros BENCHMARK() has influence on the measured values.

System
OS Ubuntu 20.04
The bug was reproduced on both

  • g++ 9.4.0
  • clang 10.0.0
    Also bug can be reproduced on the quick bench website with clang 13.0.

To reproduce
Here is the link to the quick bench website where I was able to reproduce the bug
https://quick-bench.com/q/ALwB-w76BalgONHH_rypCFOtwH0

Here is the code which reproduces bug, for the case if the link above does not work:

#include <algorithm>
#include <ctime>
#include <random>
#include <deque>
#include <queue>
#include <random>
#include <set>
#include <utility>


std::vector<std::queue<double>> GenerateSources(std::mt19937& rnd, int src_num,
                                                int src_len) {
  std::uniform_real_distribution<double> dist(0.0, 1.0);
  std::vector<std::queue<double>> result;
  result.reserve(src_num);
  while (src_num--) {
    std::vector<double> buff(src_len);
    std::generate(buff.begin(), buff.end(), [&]() { return dist(rnd); });
    std::sort(buff.begin(), buff.end());
    result.emplace_back();
    for (const auto& el : buff) {
      result.back().push(el);
    }
  }
  return result;
}



struct SetComparator {
  bool operator()(const std::queue<double>* lhs,
                  const std::queue<double>* rhs) const {
    return lhs->front() < rhs->front();
  }
};

std::vector<double> SolveUsingSet(std::vector<std::queue<double>>& data) {
  std::vector<double> result;
  result.reserve(data.size() * data[0].size());
  std::set<std::queue<double>*, SetComparator> buff;
  for (auto& el : data) {
    buff.insert(&el);
  }

  while (!buff.empty()) {
    auto* top = *buff.begin();
    buff.erase(buff.begin());
    result.push_back(top->front());
    top->pop();
    if (!top->empty()) {
      buff.insert(top);
    }
  }

  return result;
}

static void GenerateSourceBM(benchmark::State& state) {
  std::mt19937 rnd(std::time(nullptr));
  for (auto _ : state) {
    std::vector<std::queue<double>> sources;
    benchmark::DoNotOptimize(
        sources = GenerateSources(rnd, /*src_num=*/state.range(0),
                                  /*src_len=*/state.range(1)));
  }
}

static void SetSolutionBM(benchmark::State& state) {
  std::mt19937 rnd(std::time(nullptr));
  for (auto _ : state) {
    std::vector<std::queue<double>> sources;
    benchmark::DoNotOptimize(
        sources = GenerateSources(rnd, /*src_num=*/state.range(0),
                                  /*src_len=*/state.range(1)));
    std::vector<double> res;
    benchmark::DoNotOptimize(res = SolveUsingSet(sources));
  }
}

std::vector<double> SolveUsingHeap(std::vector<std::queue<double>>& data) {
  std::vector<double> result;
  result.reserve(data.size() * data[0].size());
  std::vector<std::queue<double>*> buff;
  buff.reserve(data.size());
  for (auto& el : data) {
    buff.push_back(&el);
  }
  std::make_heap(buff.begin(), buff.end());

  auto heap_comp = [](const std::queue<double>* lhs,
                      const std::queue<double>* rhs) {
    return lhs->front() > rhs->front();
  };

  while (!buff.empty()) {
    std::pop_heap(buff.begin(), buff.end(), heap_comp);
    auto* top = buff.back();
    buff.pop_back();
    result.push_back(top->front());
    top->pop();
    if (!top->empty()) {
      buff.push_back(top);
      std::push_heap(buff.begin(), buff.end(), heap_comp);
    }
  }

  return result;
}

static void HeapSolutionBM(benchmark::State& state) {
  std::mt19937 rnd(std::time(nullptr));
  for (auto _ : state) {
    std::vector<std::queue<double>> sources;
    benchmark::DoNotOptimize(
        sources = GenerateSources(rnd, /*src_num=*/state.range(0),
                                  /*src_len=*/state.range(1)));
    std::vector<double> res;
    benchmark::DoNotOptimize(res = SolveUsingHeap(sources));
  }
}

std::vector<double> SolveUsingSortedVector(
    std::vector<std::queue<double>>& data) {
  std::vector<double> result;
  result.reserve(data.size() * data[0].size());
  std::vector<std::queue<double>*> buff;
  buff.reserve(data.size());
  for (auto& el : data) {
    buff.push_back(&el);
  }
  auto comp = [](const std::queue<double>* lhs, const std::queue<double>* rhs) {
    return lhs->front() > rhs->front();
  };
  std::sort(buff.begin(), buff.end(), comp);

  while (!buff.empty()) {
    auto* top = buff.back();
    buff.pop_back();
    result.push_back(top->front());
    top->pop();
    if (!top->empty()) {
      auto it = std::lower_bound(buff.begin(), buff.end(), top, comp);
      buff.insert(it, top);
    }
  }

  return result;
}

static void SortVectorSolutionBM(benchmark::State& state) {
  std::mt19937 rnd(std::time(nullptr));
  for (auto _ : state) {
    std::vector<std::queue<double>> sources;
    benchmark::DoNotOptimize(
        sources = GenerateSources(rnd, /*src_num=*/state.range(0),
                                  /*src_len=*/state.range(1)));
    std::vector<double> res;
    benchmark::DoNotOptimize(res = SolveUsingSortedVector(sources));
  }
}

// Register the function as a benchmark
BENCHMARK(GenerateSourceBM)->ArgsProduct({{1000}, {1}});
BENCHMARK(SetSolutionBM)->ArgsProduct({{1000}, {1}});
BENCHMARK(HeapSolutionBM)->ArgsProduct({{1000}, {1}});
BENCHMARK(SortVectorSolutionBM)->ArgsProduct({{1000}, {1}});

GenerateSourceBM - is the benchmark which measures the duration of only data generation.
Results of this benchmark depends on whether it is registered as the first one or not.

Expected behavior
I expect that benchmark results will not depend on the order of their registration

Screenshots
Results when generation data benchmark is the first in the row
image
Results when generation data benchmark is the last in the row
image

@tekert
Copy link

tekert commented Aug 13, 2022

This seems to be a problem with how the assembly is placed on memory i think, i can get similar problems when statically linking something that does nothing (some assembler files that never get called) and magically 3 of 10 tests get 40-60% worse timings on variable data sizes and iterations.

@bogdan-lab
Copy link
Author

But in my case all benchmarks call the same function GenerateSources. If the position of its assembly code slows down the performance it should slow it down for all benchmarks. I mean the relative results - generating data and manipulating it is slower than just generating data - should still be reproducible, but that is not the case.

Also it is strange that such situation occurs only in the specific case 1000 queues and 1 element in each. I do not have such problem in case with 1000 queues and 2 elements in each.
I have tried to make these case parameters totally "runtime" and put them into the separate txt file and read them from there at execution using BENCHMARK()->Apply(ReadArgumentsFromConfig) syntax. The problem still reproduces...

@dmah42
Copy link
Member

dmah42 commented Aug 18, 2022

can you try running using random interleaving?

see #1051 for the details.

@bogdan-lab
Copy link
Author

Yes, it helped. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants