Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print threads on stuck on flaky test #2636

Closed
wants to merge 3 commits into from
Closed

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Feb 21, 2023

This PR is meant to help debug flaky test runs that get stuck inside the execution of the #work_pending? test.

Here's an example where CircleCI shuts down a container with no output after 10 minutes of inactivity:

Datadog::Core::Workers::IntervalLoop
  when included into a worker
    #loop_wait_time=
      is expected to change `worker.loop_wait_time` from 1 to 0.6986260331927575
    #work_pending?
      when the worker is not running
        is expected to equal false
      when the worker is running
Killed
/usr/local/bin/ruby -I/usr/local/bundle/gems/rspec-core-3.12.0/lib:/usr/local/bundle/gems/rspec-support-3.12.0/lib /usr/local/bundle/gems/rspec-core-3.12.0/exe/rspec --pattern spec/\*\*/\*_spec.rb  --exclude-pattern spec/\*\*/\{contrib,benchmark,redis,opentracer,auto_instrument,opentelemetry\}/\*\*/\*_spec.rb,\ spec/\*\*/\{auto_instrument,opentelemetry\}_spec.rb failed
rake aborted!
Command failed with status (1): [bundle exec appraisal ruby-2.7.6-core-old ...]

@marcotc marcotc requested a review from a team February 21, 2023 21:20
@github-actions github-actions bot added the dev/testing Involves testing processes (e.g. RSpec) label Feb 21, 2023
@marcotc marcotc force-pushed the print-stuck-single-test branch from 800740e to fe7cc25 Compare February 21, 2023 21:23
@codecov-commenter
Copy link

codecov-commenter commented Feb 21, 2023

Codecov Report

Merging #2636 (f62009c) into master (f158b71) will decrease coverage by 0.03%.
The diff coverage is 46.66%.

@@            Coverage Diff             @@
##           master    #2636      +/-   ##
==========================================
- Coverage   98.07%   98.04%   -0.03%     
==========================================
  Files        1153     1153              
  Lines       63128    63158      +30     
  Branches     2813     2817       +4     
==========================================
+ Hits        61912    61925      +13     
- Misses       1216     1233      +17     
Impacted Files Coverage Δ
spec/datadog/core/workers/interval_loop_spec.rb 89.40% <46.66%> (-10.60%) ⬇️
...ec/datadog/tracing/contrib/sidekiq/patcher_spec.rb 96.00% <0.00%> (-4.00%) ⬇️
...atadog/tracing/contrib/grpc/support/grpc_helper.rb 98.24% <0.00%> (-1.76%) ⬇️
lib/datadog/core/diagnostics/environment_logger.rb 97.69% <0.00%> (-0.77%) ⬇️
...dog/profiling/collectors/cpu_and_wall_time_spec.rb 97.98% <0.00%> (+0.40%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ivoanjo
Copy link
Member

ivoanjo commented Feb 22, 2023

Soo... I strongly suspect hang in IntervalLoop is #2271 (comment) .

I am somewhat wary of creating a new thread for every testcase to debug IntervalLoop; can I convince you to reduce the scope of this PR just to that module's specs, and not to the whole test suite?

@marcotc
Copy link
Member Author

marcotc commented Feb 22, 2023

Sure, will reduce the scope.

@marcotc
Copy link
Member Author

marcotc commented Feb 22, 2023

Soo... I strongly suspect hang in IntervalLoop is #2271 (comment) .

The case where it just completely hangs has been around for quite a bit, but I've yet to be able to reproduce it.

@marcotc marcotc changed the title Print threads on stuck single tests Print threads on stuck on flaky test Feb 22, 2023
@marcotc
Copy link
Member Author

marcotc commented Feb 22, 2023

There are two tests that sometimes flake in the same file, moved to those.

@marcotc
Copy link
Member Author

marcotc commented Feb 23, 2023

Hey, it happened in this branch, but it didn't print anything :(
https://app.circleci.com/pipelines/github/DataDog/dd-trace-rb/9046/workflows/6dea0b42-9df5-4ba9-82d2-84c764637028/jobs/336536
I have to investigate it.

@marcotc marcotc marked this pull request as draft February 23, 2023 01:01
@ivoanjo
Copy link
Member

ivoanjo commented Feb 23, 2023

I guess it may help to sprinkle a few sleeps and Thread.pass on the involved methods to cause some less-frequent interleavings.

@marcotc marcotc closed this Oct 26, 2023
@GustavoCaso GustavoCaso deleted the print-stuck-single-test branch October 27, 2023 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev/testing Involves testing processes (e.g. RSpec)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants