[BUG] Benchmark result depends on the order in which benchmarks were registrated #1469

bogdan-lab · 2022-08-11T15:09:05Z

Describe the bug
I tried to measure different solution of the "merge from multiple sources" problem.
In order to do that I needed to prepare vector of sorted queues which I was going to merge.
I decided to put these queue generation code inside the measuring loop and simply created separate benchmark where I measure time required for queue generation only.
The problem is that in one corner case (1000 queues with 1 element in each) application showed that it is faster to generate data and merge it than ONLY generate.
Other cases, even 1000 queues with 2 elements in each showed adequate results - only generation of the data is faster than generation with its manipulation.
Anyway, in order to discard the possibility for compiler to optimize something I tried to move the function which generates data to the separate file, use benchmark::DoNotOptimize everywhere but result was still meaningless.

The result for the problematic case become adequate if I register benchmark which measures only data generation after at least one another benchmark.
This is, the order of benchmark registration by macros BENCHMARK() has influence on the measured values.

System
OS Ubuntu 20.04
The bug was reproduced on both

g++ 9.4.0
clang 10.0.0
Also bug can be reproduced on the quick bench website with clang 13.0.

To reproduce
Here is the link to the quick bench website where I was able to reproduce the bug
https://quick-bench.com/q/ALwB-w76BalgONHH_rypCFOtwH0

Here is the code which reproduces bug, for the case if the link above does not work:

#include <algorithm>
#include <ctime>
#include <random>
#include <deque>
#include <queue>
#include <random>
#include <set>
#include <utility>


std::vector<std::queue<double>> GenerateSources(std::mt19937& rnd, int src_num,
                                                int src_len) {
  std::uniform_real_distribution<double> dist(0.0, 1.0);
  std::vector<std::queue<double>> result;
  result.reserve(src_num);
  while (src_num--) {
    std::vector<double> buff(src_len);
    std::generate(buff.begin(), buff.end(), [&]() { return dist(rnd); });
    std::sort(buff.begin(), buff.end());
    result.emplace_back();
    for (const auto& el : buff) {
      result.back().push(el);
    }
  }
  return result;
}



struct SetComparator {
  bool operator()(const std::queue<double>* lhs,
                  const std::queue<double>* rhs) const {
    return lhs->front() < rhs->front();
  }
};

std::vector<double> SolveUsingSet(std::vector<std::queue<double>>& data) {
  std::vector<double> result;
  result.reserve(data.size() * data[0].size());
  std::set<std::queue<double>*, SetComparator> buff;
  for (auto& el : data) {
    buff.insert(&el);
  }

  while (!buff.empty()) {
    auto* top = *buff.begin();
    buff.erase(buff.begin());
    result.push_back(top->front());
    top->pop();
    if (!top->empty()) {
      buff.insert(top);
    }
  }

  return result;
}

static void GenerateSourceBM(benchmark::State& state) {
  std::mt19937 rnd(std::time(nullptr));
  for (auto _ : state) {
    std::vector<std::queue<double>> sources;
    benchmark::DoNotOptimize(
        sources = GenerateSources(rnd, /*src_num=*/state.range(0),
                                  /*src_len=*/state.range(1)));
  }
}

static void SetSolutionBM(benchmark::State& state) {
  std::mt19937 rnd(std::time(nullptr));
  for (auto _ : state) {
    std::vector<std::queue<double>> sources;
    benchmark::DoNotOptimize(
        sources = GenerateSources(rnd, /*src_num=*/state.range(0),
                                  /*src_len=*/state.range(1)));
    std::vector<double> res;
    benchmark::DoNotOptimize(res = SolveUsingSet(sources));
  }
}

std::vector<double> SolveUsingHeap(std::vector<std::queue<double>>& data) {
  std::vector<double> result;
  result.reserve(data.size() * data[0].size());
  std::vector<std::queue<double>*> buff;
  buff.reserve(data.size());
  for (auto& el : data) {
    buff.push_back(&el);
  }
  std::make_heap(buff.begin(), buff.end());

  auto heap_comp = [](const std::queue<double>* lhs,
                      const std::queue<double>* rhs) {
    return lhs->front() > rhs->front();
  };

  while (!buff.empty()) {
    std::pop_heap(buff.begin(), buff.end(), heap_comp);
    auto* top = buff.back();
    buff.pop_back();
    result.push_back(top->front());
    top->pop();
    if (!top->empty()) {
      buff.push_back(top);
      std::push_heap(buff.begin(), buff.end(), heap_comp);
    }
  }

  return result;
}

static void HeapSolutionBM(benchmark::State& state) {
  std::mt19937 rnd(std::time(nullptr));
  for (auto _ : state) {
    std::vector<std::queue<double>> sources;
    benchmark::DoNotOptimize(
        sources = GenerateSources(rnd, /*src_num=*/state.range(0),
                                  /*src_len=*/state.range(1)));
    std::vector<double> res;
    benchmark::DoNotOptimize(res = SolveUsingHeap(sources));
  }
}

std::vector<double> SolveUsingSortedVector(
    std::vector<std::queue<double>>& data) {
  std::vector<double> result;
  result.reserve(data.size() * data[0].size());
  std::vector<std::queue<double>*> buff;
  buff.reserve(data.size());
  for (auto& el : data) {
    buff.push_back(&el);
  }
  auto comp = [](const std::queue<double>* lhs, const std::queue<double>* rhs) {
    return lhs->front() > rhs->front();
  };
  std::sort(buff.begin(), buff.end(), comp);

  while (!buff.empty()) {
    auto* top = buff.back();
    buff.pop_back();
    result.push_back(top->front());
    top->pop();
    if (!top->empty()) {
      auto it = std::lower_bound(buff.begin(), buff.end(), top, comp);
      buff.insert(it, top);
    }
  }

  return result;
}

static void SortVectorSolutionBM(benchmark::State& state) {
  std::mt19937 rnd(std::time(nullptr));
  for (auto _ : state) {
    std::vector<std::queue<double>> sources;
    benchmark::DoNotOptimize(
        sources = GenerateSources(rnd, /*src_num=*/state.range(0),
                                  /*src_len=*/state.range(1)));
    std::vector<double> res;
    benchmark::DoNotOptimize(res = SolveUsingSortedVector(sources));
  }
}

// Register the function as a benchmark
BENCHMARK(GenerateSourceBM)->ArgsProduct({{1000}, {1}});
BENCHMARK(SetSolutionBM)->ArgsProduct({{1000}, {1}});
BENCHMARK(HeapSolutionBM)->ArgsProduct({{1000}, {1}});
BENCHMARK(SortVectorSolutionBM)->ArgsProduct({{1000}, {1}});

GenerateSourceBM - is the benchmark which measures the duration of only data generation.
Results of this benchmark depends on whether it is registered as the first one or not.

Expected behavior
I expect that benchmark results will not depend on the order of their registration

Screenshots
Results when generation data benchmark is the first in the row

Results when generation data benchmark is the last in the row

The text was updated successfully, but these errors were encountered:

tekert · 2022-08-13T02:50:16Z

This seems to be a problem with how the assembly is placed on memory i think, i can get similar problems when statically linking something that does nothing (some assembler files that never get called) and magically 3 of 10 tests get 40-60% worse timings on variable data sizes and iterations.

bogdan-lab · 2022-08-13T06:40:52Z

But in my case all benchmarks call the same function GenerateSources. If the position of its assembly code slows down the performance it should slow it down for all benchmarks. I mean the relative results - generating data and manipulating it is slower than just generating data - should still be reproducible, but that is not the case.

Also it is strange that such situation occurs only in the specific case 1000 queues and 1 element in each. I do not have such problem in case with 1000 queues and 2 elements in each.
I have tried to make these case parameters totally "runtime" and put them into the separate txt file and read them from there at execution using BENCHMARK()->Apply(ReadArgumentsFromConfig) syntax. The problem still reproduces...

dmah42 · 2022-08-18T08:44:41Z

can you try running using random interleaving?

see #1051 for the details.

bogdan-lab · 2022-08-21T14:16:54Z

Yes, it helped. Thank you!

bogdan-lab closed this as completed Aug 21, 2022

ichaer mentioned this issue Jan 24, 2024

[libc++] Speed up set_intersection() by fast-forwarding over ranges of non-matching elements with one-sided binary search. llvm/llvm-project#75230

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Benchmark result depends on the order in which benchmarks were registrated #1469

[BUG] Benchmark result depends on the order in which benchmarks were registrated #1469

bogdan-lab commented Aug 11, 2022

tekert commented Aug 13, 2022

bogdan-lab commented Aug 13, 2022

dmah42 commented Aug 18, 2022

bogdan-lab commented Aug 21, 2022

[BUG] Benchmark result depends on the order in which benchmarks were registrated #1469

[BUG] Benchmark result depends on the order in which benchmarks were registrated #1469

Comments

bogdan-lab commented Aug 11, 2022

tekert commented Aug 13, 2022

bogdan-lab commented Aug 13, 2022

dmah42 commented Aug 18, 2022

bogdan-lab commented Aug 21, 2022