Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K6 Process Hangs After Summary Report #2937

Closed
ichasepucks opened this issue Feb 24, 2023 · 8 comments · Fixed by #2938
Closed

K6 Process Hangs After Summary Report #2937

ichasepucks opened this issue Feb 24, 2023 · 8 comments · Fixed by #2938

Comments

@ichasepucks
Copy link

Brief summary

Starting with 0.43.0 we're noticing the K6 process does not terminate after the summary report. This does not happen with 0.42.0 and earlier versions.

We are building our own custom docker image with the k6-sql plugin. However, I've tried with the loadimpact/k6 image as well and can reproduce, so I don't think it's actually plugin related.

Other configuration details:

  • Using experimental-prometheus-rw (tried without this as well)
  • K6_PROMETHEUS_RW_STALE_MARKERS=true (tried both)
  • K6_PROMETHEUS_RW_TREND_STATS=p(90),p(95),p(99)
  • options --log-format=json --no-color

The execution environment is on a Kubernetes cluster.

Tail of logs from 0.43.0 simplified

{"executor":"shared-iterations","gracefulStop":30000000000,"level":"debug","msg":"Regular duration is done, waiting for iterations to gracefully finish","scenario":"xxx","time":"2023-02-24T15:41:22Z"}
{"executor":"xxx","level":"debug","msg":"Executor finished successfully","startTime":0,"time":"2023-02-24T15:41:22Z","type":"shared-iterations"}
{"level":"debug","msg":"Running teardown()...","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Test finished cleanly","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Stopping vus and vux_max metrics emission...","phase":"execution-scheduler-init","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Metrics emission of VUs and VUsMax metrics stopped","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Releasing signal trap...","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Sending usage report...","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Waiting for metric processing to finish...","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Metrics processing finished!","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Finalizing thresholds...","time":"2023-02-24T15:41:22Z"}
{"component":"metrics-engine-ingester","level":"debug","msg":"Stopping...","time":"2023-02-24T15:41:22Z"}
{"component":"metrics-engine-ingester","level":"debug","msg":"Stopped!","time":"2023-02-24T15:41:22Z"}
{"component":"metrics-engine","level":"debug","msg":"Running thresholds on 1 metrics...","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Stopping outputs...","time":"2023-02-24T15:41:22Z"}
{"component":"output-manager","level":"debug","msg":"Stopping 2 outputs...","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Stopping the output","output":"Prometheus remote write","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Converted samples to Prometheus TimeSeries","nts":60,"output":"Prometheus remote write","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Successful flushed time series to remote write endpoint","nts":60,"output":"Prometheus remote write","time":"2023-02-24T15:41:22Z","took":1268409}
{"level":"debug","msg":"Marking time series as stale","output":"Prometheus remote write","staleMarkers":1682,"time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Output stopped","output":"Prometheus remote write","time":"2023-02-24T15:41:22Z"}
{"component":"metrics-engine-ingester","level":"debug","msg":"Stopping...","time":"2023-02-24T15:41:22Z"}
{"component":"metrics-engine-ingester","level":"debug","msg":"Stopped!","time":"2023-02-24T15:41:22Z"}
{"level":"debug","msg":"Generating the end-of-test summary...","time":"2023-02-24T15:41:22Z"}

     ✗ xxx is status 200
      ↳  0% — ✓ 0 / ✗ 7
     ✓ xxx is status 200
     ...
     ✓ xxx is status 200

     █ setup

     █ teardown

   ✓ checks.........................: 98.39% ✓ 430      ✗ 7
     data_received..................: 1.0 GB 5.3 MB/s
     data_sent......................: 90 MB  460 kB/s
     http_req_blocked...............: avg=2.73ms   min=2.36µs  med=5.65µs   max=199.84ms p(90)=18.46µs  p(95)=3.67ms
     http_req_connecting............: avg=2.1ms    min=0s      med=0s       max=199.65ms p(90)=0s       p(95)=2.69ms
     http_req_duration..............: avg=322.44ms min=1.44ms  med=20.67ms  max=20.55s   p(90)=727.12ms p(95)=1.92s
       { expected_response:true }...: avg=319.09ms min=1.44ms  med=20.64ms  max=20.55s   p(90)=714.07ms p(95)=1.87s
     http_req_failed................: 0.36%  ✓ 7        ✗ 1907
     http_req_receiving.............: avg=138.63ms min=26.33µs med=67.56µs  max=18.04s   p(90)=16.11ms  p(95)=390.93ms
     http_req_sending...............: avg=9.58ms   min=13.78µs med=36.62µs  max=3.29s    p(90)=158.16µs p(95)=470.65µs
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=174.21ms min=1.36ms  med=19.91ms  max=5.34s    p(90)=383.5ms  p(95)=900.32ms
     http_reqs......................: 1914   9.770178/s
     iteration_duration.............: avg=5s       min=2.25µs  med=790.24ms max=1m55s    p(90)=6.49s    p(95)=24.82s
     iterations.....................: 410    2.09288/s
     vus............................: 1      min=0      max=123
     vus_max........................: 123    min=123    max=123


running (03m15.9s), 000/123 VUs, 410 complete and 0 interrupted iterations
xxx             ✓ [ 100% ] 3 VUs  00m04.6s/10m0s  10/10 shared iters
...
xxx             ✓ [ 100% ] 3 VUs  00m04.8s/10m0s  10/10 shared iters

Tail of logs from 0.42.0 simplified

{"executor":"shared-iterations","gracefulStop":30000000000,"level":"debug","msg":"Regular duration is done, waiting for iterations to gracefully finish","scenario":"xxx","time":"2023-02-24T16:10:07Z"}
{"executor":"xxx","level":"debug","msg":"Executor finished successfully","startTime":0,"time":"2023-02-24T16:10:07Z","type":"shared-iterations"}
{"level":"debug","msg":"Running teardown()...","time":"2023-02-24T16:10:07Z"}
{"level":"debug","msg":"Metrics emission of VUs and VUsMax metrics stopped","time":"2023-02-24T16:10:07Z"}
{"component":"engine","error":null,"level":"debug","msg":"Execution scheduler terminated","time":"2023-02-24T16:10:07Z"}
{"component":"engine","level":"debug","msg":"run: execution scheduler terminated","time":"2023-02-24T16:10:07Z"}
{"component":"engine","level":"debug","msg":"Processing metrics and thresholds after the test run has ended...","time":"2023-02-24T16:10:07Z"}
{"component":"metrics-engine-ingester","level":"debug","msg":"Stopping...","time":"2023-02-24T16:10:07Z"}
{"component":"metrics-engine-ingester","level":"debug","msg":"Stopped!","time":"2023-02-24T16:10:07Z"}
{"component":"metrics-engine","level":"debug","metric_name":"checks","msg":"running thresholds","time":"2023-02-24T16:10:07Z"}
{"level":"debug","msg":"Engine run terminated cleanly","time":"2023-02-24T16:10:07Z"}
{"component":"engine","level":"debug","msg":"Engine: Thresholds terminated","time":"2023-02-24T16:10:07Z"}

running (03m49.7s), 000/123 VUs, 410 complete and 0 interrupted iterations
xxx             ✓ [ 100% ] 3 VUs  00m05.2s/10m0s  10/10 shared iters
...
xxx             ✓ [ 100% ] 3 VUs  00m11.5s/10m0s  10/10 shared iters

     ✓ xxx is status 200
{"level":"debug","msg":"Waiting for engine processes to finish...","time":"2023-02-24T16:10:07Z"}
{"component":"engine","level":"debug","msg":"Metrics processing winding down...","time":"2023-02-24T16:10:07Z"}
{"component":"metrics-engine","level":"debug","metric_name":"checks","msg":"running thresholds","time":"2023-02-24T16:10:07Z"}
{"component":"engine","level":"debug","msg":"Metrics processing finished!","time":"2023-02-24T16:10:07Z"}
     ✓ xxx is status 200
     ...
{"level":"debug","msg":"Everything has finished, exiting k6!","time":"2023-02-24T16:10:07Z"}
     ...
     ✓ xxx response: internal status success

     █ setup

     █ teardown

   ✓ checks.........................: 98.17% ✓ 430       ✗ 8
     data_received..................: 815 MB 3.5 MB/s
     data_sent......................: 78 MB  340 kB/s
     http_req_blocked...............: avg=3.14ms   min=2.03µs  med=5.35µs   max=511.21ms p(90)=11.65µs  p(95)=2.07ms
     http_req_connecting............: avg=2.08ms   min=0s      med=0s       max=105.57ms p(90)=0s       p(95)=1.39ms
     http_req_duration..............: avg=243.26ms min=1.63ms  med=19.85ms  max=13.01s   p(90)=522.37ms p(95)=1.3s
       { expected_response:true }...: avg=241.52ms min=1.63ms  med=19.84ms  max=13.01s   p(90)=507.18ms p(95)=1.3s
     http_req_failed................: 0.33%  ✓ 8         ✗ 2359
     http_req_receiving.............: avg=114.76ms min=20.38µs med=63.04µs  max=12.13s   p(90)=1.38ms   p(95)=460.7ms
     http_req_sending...............: avg=5.91ms   min=11.85µs med=34.78µs  max=2.19s    p(90)=129.25µs p(95)=395.64µs
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=122.57ms min=1.39ms  med=19.33ms  max=6.12s    p(90)=202.54ms p(95)=543.76ms
     http_reqs......................: 2367   10.304584/s
     iteration_duration.............: avg=6.05s    min=2.82µs  med=968.67ms max=2m43s    p(90)=5.32s    p(95)=41.86s
     iterations.....................: 410    1.784909/s
     vus............................: 3      min=0       max=123
     vus_max........................: 123    min=123     max=123

{"level":"debug","msg":"Converted samples to Prometheus TimeSeries","nts":190,"output":"Prometheus remote write","time":"2023-02-24T16:10:07Z"}
{"level":"debug","msg":"Successful flushed time series to remote write endpoint","nts":190,"output":"Prometheus remote write","time":"2023-02-24T16:10:07Z","took":2562849}
{"component":"output-manager","level":"debug","msg":"Stopping 2 outputs...","time":"2023-02-24T16:10:10Z"}
{"level":"debug","msg":"Stopping the output","output":"Prometheus remote write","time":"2023-02-24T16:10:10Z"}
{"level":"debug","msg":"no buffered samples, skip the flushing operation","output":"Prometheus remote write","time":"2023-02-24T16:10:10Z"}
{"level":"debug","msg":"Successful flushed time series to remote write endpoint","nts":0,"output":"Prometheus remote write","time":"2023-02-24T16:10:10Z","took":15117}
{"level":"debug","msg":"Marking time series as stale","output":"Prometheus remote write","staleMarkers":1682,"time":"2023-02-24T16:10:10Z"}
{"level":"debug","msg":"Output stopped","output":"Prometheus remote write","time":"2023-02-24T16:10:10Z"}
{"component":"metrics-engine-ingester","level":"debug","msg":"Stopping...","time":"2023-02-24T16:10:10Z"}
{"component":"metrics-engine-ingester","level":"debug","msg":"Stopped!","time":"2023-02-24T16:10:10Z"}
Shutting down...
Exiting with status 0

Definitely a very different logging profile from 0.43.0.

I don't have a simplified repro script just yet. If the logging is not indicative of the problem, I can work on trying to get repro steps that don't include our code.

k6 version

0.43.0

OS

Arch Linux

Docker version and image (if applicable)

No response

Steps to reproduce the problem

Execute scripts

Expected behaviour

Process terminates after script executions complete

Actual behaviour

K6 process never terminates

@na--
Copy link
Member

na-- commented Feb 24, 2023

You don't see any logs after Generating the end-of-test summary... on v0.43.0? 😕

@ichasepucks
Copy link
Author

@na--, correct. The logs you see posted above are in the order seen and are until the last line.

@mstoykov
Copy link
Contributor

Can you instead of ctrl+c, to stop it, to press ctrl+\ which will again kill the process but dump the stacktraces of the current goroutines.

It will likely output quite a lot so maybe uploading it to a gist will be better.

@ichasepucks
Copy link
Author

ichasepucks commented Feb 24, 2023

@mstoykov check https://gist.github.com/ichasepucks/431a2f30fa7a69124cc31787a85545f0. I had one interrupted view on this run which is not usual. I may try to create a second stacktrace to be sure we don't conflate any issues here.

Here is a second stack trace where I didn't get any interrupted views. https://gist.github.com/ichasepucks/f0fe8c12b63a6c9fca6388b1b7ed718e

@kempsterc
Copy link
Contributor

https://gist.github.com/kempsterc/5dc1f549e254b52a4f57c2e7264d5fe9

heres one from my run

@na--
Copy link
Member

na-- commented Feb 25, 2023

@mstoykov tracked down the issue and it turns out it's a stupid mistake I made when refactoring cmd/run.go as a part of #2815 😞 The code here, instead of silently cancelling the usage report if it can't be sent in under 3 seconds, causes k6 to hang indefinitely, since nothing is going to read the reportDone channel and it's not buffered... 🤦‍♂️

k6/cmd/run.go

Lines 309 to 327 in 2305316

if !conf.NoUsageReport.Bool {
backgroundProcesses.Add(2)
reportCtx, reportCancel := context.WithCancel(globalCtx)
reportDone := make(chan error)
go func() {
logger.Debug("Sending usage report...")
reportDone <- reportUsage(reportCtx, execScheduler)
close(reportDone)
backgroundProcesses.Done()
}()
go func() {
select {
case <-reportDone:
case <-time.After(3 * time.Second):
}
reportCancel()
backgroundProcesses.Done()
}()
}

The good news is that the fix is relatively easy (#2938) and we'll release it as k6 v0.43.1 early next week (probably along with a few other minor bugfixes). Until then, the problem can be avoided if the usage reports are disabled by one of these options or if you use the master branch in your xk6 builds instead of v0.43.0.

@na--
Copy link
Member

na-- commented Feb 27, 2023

This should now be fixed in https://github.com/grafana/k6/releases/tag/v0.43.1

@ichasepucks
Copy link
Author

Confirmed this fixed it for me. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants