Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FINISHED printed twice for the same matrix #577

Closed
tir38 opened this issue Jul 19, 2019 · 6 comments · Fixed by #584
Closed

FINISHED printed twice for the same matrix #577

tir38 opened this issue Jul 19, 2019 · 6 comments · Fixed by #584

Comments

@tir38
Copy link

tir38 commented Jul 19, 2019

We've got logs that look like this:

PollMatrices
  0m  0s matrix-3s9zncxxveg2p VALIDATING
  0m 15s matrix-3s9zncxxveg2p PENDING
  0m 45s matrix-3s9zncxxveg2p Starting attempt 1.
  0m 45s matrix-3s9zncxxveg2p RUNNING
  1m 30s matrix-3s9zncxxveg2p Started logcat recording.
  1m 30s matrix-3s9zncxxveg2p Preparing device.
  2m 16s matrix-3s9zncxxveg2p Installing apps.
  2m 16s matrix-3s9zncxxveg2p Retrieving Pre-Test Package Stats information from the device.
  2m 16s matrix-3s9zncxxveg2p Retrieving Performance Environment information from the device.
  2m 16s matrix-3s9zncxxveg2p Started crash detection.
  2m 16s matrix-3s9zncxxveg2p Started crash monitoring.
  2m 16s matrix-3s9zncxxveg2p Started performance monitoring.
  2m 16s matrix-3s9zncxxveg2p Starting instrumentation test.
  7m 47s matrix-3s9zncxxveg2p Completed instrumentation test.
  8m  2s matrix-3s9zncxxveg2p Stopped performance monitoring.
  8m  2s matrix-3s9zncxxveg2p Stopped crash monitoring.
  8m  2s matrix-3s9zncxxveg2p Stopped logcat recording.
  8m  2s matrix-3s9zncxxveg2p Retrieving Post-test Package Stats information from the device.
  8m 18s matrix-3s9zncxxveg2p Done. Test time = 325 (secs)
  8m 18s matrix-3s9zncxxveg2p Starting results processing. Attempt: 1
  8m 33s matrix-3s9zncxxveg2p Completed results processing. Time taken = 5 (secs)
  8m 33s matrix-3s9zncxxveg2p FINISHED
  16m 50s matrix-3s9zncxxveg2p FINISHED
  16m 52s matrix-8hrmo9k0ib8qa FINISHED
  16m 52s matrix-285fsi5r9d3ra FINISHED
  16m 53s matrix-3th7ls1269fhc FINISHED
  1. why am I only seeing "setup" steps for that first matrix?
  2. why does the first matrix "FINISHED" twice? once at 8 mins and then again at 16 mins?

We shard on 5 devices, most take ~5 minutes. One takes ~10 minutes. Is it possible that the polling is hearing the first device complete (at ~8 mins) and then hearing the slow device complete (at ~16 mins)?

@bootstraponline
Copy link
Contributor

For context, the way Flank works is by creating a bunch of matrices in parallel. Starting with the first matrix, the status is output until the matrix is finished. If the matrix is still running, the current status from FTL is printed to the console. If the matrix is already finished, then FINISHED is printed and flank moves onto the next matrix.

  1. why am I only seeing "setup" steps for that first matrix?

The first matrix is in progress, it hasn't finished yet. Flank polls this matrix until it's complete. While the first matrix is busy running, the other matrices are finishing.

2. why does the first matrix "FINISHED" twice? once at 8 mins and then again at 16 mins?

That's weird, I'd expect matrix-3s9zncxxveg2p to finish once.

Is it possible that the polling is hearing the first device complete (at ~8 mins) and then hearing the slow device complete (at ~16 mins)?

All matrices are run in parallel, only the polling is sequential.

The FTL team is aware that the hack of running multiple matrices to enable parallel execution isn't ideal. They're working on moving sharding server side which would greatly simplify Flank as we'd only have to track one matrix.

I'd recommend posting that matrix id matrix-3s9zncxxveg2p in the FTL Slack channel and see if they have any insight into what's going on. All flank is doing is waiting for the finished state to be reached.

@tir38
Copy link
Author

tir38 commented Aug 2, 2019

For context, the way Flank works is by creating a bunch of matrices in parallel. Starting with the first matrix, the status is output until the matrix is finished. If the matrix is still running, the current status from FTL is printed to the console. If the matrix is already finished, then FINISHED is printed and flank moves onto the next matrix.

IIUC when the second matrix starts i should only hear the steps that are remaining. I.e. if the second matrix is only half way done when the first one finishes, then i'll only see statuses for the second half of the second matrix.

If the second matrix is also finished when the first finishes then I'll only see "FINISHED" for the second one... and on to the third, fourth, etc..

If that's correct can you explain why I'm now seeing "all" of the third matrix's logs:

 0m  0s matrix-1dkfbfqj093qz VALIDATING
  0m 15s matrix-1dkfbfqj093qz PENDING
  0m 45s matrix-1dkfbfqj093qz Starting attempt 1.
  0m 45s matrix-1dkfbfqj093qz RUNNING
  1m 45s matrix-1dkfbfqj093qz Started logcat recording.
  1m 45s matrix-1dkfbfqj093qz Preparing device.
  2m 30s matrix-1dkfbfqj093qz Installing apps.
  2m 46s matrix-1dkfbfqj093qz Retrieving Pre-Test Package Stats information from the device.
  2m 46s matrix-1dkfbfqj093qz Retrieving Performance Environment information from the device.
  2m 46s matrix-1dkfbfqj093qz Started crash detection.
  2m 46s matrix-1dkfbfqj093qz Started crash monitoring.
  2m 46s matrix-1dkfbfqj093qz Started performance monitoring.
  2m 46s matrix-1dkfbfqj093qz Starting instrumentation test.
  8m 32s matrix-1dkfbfqj093qz Completed instrumentation test.
  8m 32s matrix-1dkfbfqj093qz Stopped performance monitoring.
  8m 47s matrix-1dkfbfqj093qz Stopped crash monitoring.
  8m 47s matrix-1dkfbfqj093qz Stopped logcat recording.
  8m 47s matrix-1dkfbfqj093qz Retrieving Post-test Package Stats information from the device.
  9m  2s matrix-1dkfbfqj093qz Done. Test time = 344 (secs)
  9m  2s matrix-1dkfbfqj093qz Starting results processing. Attempt: 1
  9m 18s matrix-1dkfbfqj093qz FINISHED /// FIRST ONE FINISHED
  9m 18s matrix-3eh73c80bgxoa FINISHED // SECOND FINISHED (OR WAS ALREADY FINISHED)
  9m 18s matrix-2zb1mmehr40mr Starting attempt 1. // THIRD ONE IS JUST NOW STARTING
  9m 18s matrix-2zb1mmehr40mr Started logcat recording.
  9m 18s matrix-2zb1mmehr40mr Preparing device.
  9m 18s matrix-2zb1mmehr40mr Installing apps.
  9m 18s matrix-2zb1mmehr40mr Retrieving Pre-Test Package Stats information from the device.
  9m 18s matrix-2zb1mmehr40mr Retrieving Performance Environment information from the device.
  9m 18s matrix-2zb1mmehr40mr Started crash detection.
  9m 18s matrix-2zb1mmehr40mr Started performance monitoring.
  9m 18s matrix-2zb1mmehr40mr Started crash monitoring.
  9m 18s matrix-2zb1mmehr40mr Starting instrumentation test.
  9m 18s matrix-2zb1mmehr40mr Completed instrumentation test.
  9m 18s matrix-2zb1mmehr40mr Stopped performance monitoring.
  9m 18s matrix-2zb1mmehr40mr Stopped crash monitoring.
  9m 18s matrix-2zb1mmehr40mr Stopped logcat recording.
  9m 18s matrix-2zb1mmehr40mr Retrieving Post-test Package Stats information from the device.
  9m 18s matrix-2zb1mmehr40mr Done. Test time = 335 (secs)
  9m 18s matrix-2zb1mmehr40mr Starting results processing. Attempt: 1
  9m 18s matrix-2zb1mmehr40mr Completed results processing. Time taken = 7 (secs)
  9m 18s matrix-2zb1mmehr40mr FINISHED
  17m 53s matrix-2zb1mmehr40mr FINISHED // ORIGINAL PROBLEM, WHY DO WE HEAR THIRD ONE FINISH TWICE?
  17m 53s matrix-3exh7555vlyc1 FINISHED

The fact that all the 2zb1mmehr40mr matrix statuses have the same 9m 18s time stamp makes me think this isn't quite right, that Flank is hearing all the statuses at once, not just the last FINISHED

@paulwdavis
Copy link

I'm not familiar with how Flank aggregates progress messages coming from the Testing API, but perhaps the issue a confusion between test matrix finish times and test execution finish times.

I think commonly Flank creates matrices with a single execution, but matrix-3s9zncxxveg2p contains 5 executions (one each for 5 different devices), each with their own FINISHED times:

2019-07-18 17:18:55
2019-07-18 17:19:39
2019-07-18 17:19:41
2019-07-18 17:27:16
2019-07-18 17:19:46

The difference between the first and last execution finish time is 8:21, which is very close to the time difference reported by Flank for the two duplicate finishes (and those have up to 15s error in them due to the polling rate).

@bootstraponline
Copy link
Contributor

Thanks for the technical details. I'll get this fixed in Flank.

@bootstraponline bootstraponline changed the title Decipher log timing FINISHED printed twice for the same matrix Aug 4, 2019
@bootstraponline
Copy link
Contributor

val firstTestStatus = refreshedMatrix.testExecutions.first()

The bug was due to Flank using the first test execution (device) and printing status messages only from that device. I've updated Flank to correctly poll all test executions within a matrix. The device id is now included in the log output so it's possible to differentiate test executions within the same matrix.

#584 has the fix

@tir38
Copy link
Author

tir38 commented Aug 26, 2019

Thanks for fixing this. I can now see that my tests are are running as fast as they should; no confusion about whether they are really running in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants