Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in circular buffer simulation test #931

Merged
merged 1 commit into from
Jul 31, 2021

Conversation

ThomsonTan
Copy link
Contributor

Fixes #841

Changes

This is long existing bug which fails our CI run intermittently. The original intention is that the consumer thread checks whether the exit flag is true, then checks the share buffer with producer thread and makes sure it is empty before exit, but the problem is it checks a cached peek result on the buffer which was generated before checking the exit flag, so when the consumer thread gets exit signal, it can only make sure the old stale snapshot of buffer is empty, not the buffer in real-time, and finally leads to early exit of the consumer thread if there is actually some data in the buffer.

It is very hard to repro this issue in local devbox even with many repeated runs, but it could be reproduced easily with some artificial sleep injected between taking snapshot of buffer and checking exit flag, as below code snippet (the line with sleep_for).

void RunNumberConsumer(CircularBuffer<uint32_t> &buffer,
                       std::atomic<bool> &exit,
                       std::vector<uint32_t> &numbers)
{
  while (true)
  {
    auto allotment = buffer.Peek();
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    if (exit && allotment.empty())
    {
      return;
    }

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

@ThomsonTan ThomsonTan requested a review from a team July 31, 2021 05:38
@codecov
Copy link

codecov bot commented Jul 31, 2021

Codecov Report

Merging #931 (901e3d0) into main (5414ebe) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #931      +/-   ##
==========================================
- Coverage   95.36%   95.36%   -0.00%     
==========================================
  Files         159      159              
  Lines        6776     6775       -1     
==========================================
- Hits         6461     6460       -1     
  Misses        315      315              
Impacted Files Coverage Δ
sdk/test/common/circular_buffer_test.cc 100.00% <100.00%> (ø)

@lalitb
Copy link
Member

lalitb commented Jul 31, 2021

Nice. Thanks for fixing it :)

@lalitb lalitb merged commit a9b2f9f into open-telemetry:main Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent test failure: 179 - trace.CircularBufferTest.Simulation (Failed)
2 participants