feat: Concurrent file uploads #53

js-murph · 2023-06-01T07:29:16Z

What?

This adds the ability for the test-collector to upload multiple files in parallel.

Why?

For large projects we are seeing instances where the performance is significantly slower than alternatives for displaying test-summary annotations, sometimes in the magnitude of several minutes slower. This adds an option for configuring the concurrency of the uploads that in my testing so far has helped halve the time spent performing this work.

tests/pre-exit-success.bats

Co-authored-by: Matías Bellone <toote@users.noreply.github.com>

toote

I would have sworn your tests would have failed but there apparently is a bug in the stubbing library in which concurrent calls to the mock requiring a single stub for all calls 🤷

That said, I found that you were also missing 2 test cases:

what happens when one of the curl calls fail
waiting for uploads to finish (the iterations loop of the code you added)

I wanted to provide a rough implementation for you to look at but the first one is complicated due to my first comment. And for the latter I stumbled with a bug: apparently, the -p option is not universal for all implementations of ps.

If I create a test case with 2 concurrent uploads and change the stub for curl to be sleep 25; echo... . The test still passes unless I also add a assert_output --partial "Waiting for uploads to finish" in which case it fails and shows the output actually shows an error:

   output (23 lines):
     Uploading '3' files matching '**/*/junit-*.xml'
     Uploading './tests/fixtures/junit-1.xml'...
     Uploading './tests/fixtures/junit-2.xml'...
     ps: unrecognized option: p
     BusyBox v1.31.1 () multi-call binary.

toote · 2023-06-16T04:43:11Z

tests/pre-exit-success.bats

+  assert_output --partial "curl success 1"
+  assert_output --partial "curl success 1"
+  assert_output --partial "curl success 1"


these three lines do not guarantee that they appear 3 times, if curl success 1 appears just one time, they would all pass.

If you take my previous suggestion to tailor the output according to the file being uploaded, you can check for each independently.

Another alternative would be to actually count the line amounts:

assert_equal "$(echo "$output" | grep -c "curl success")" "3"

I implemented your alternative suggestion, I think that sufficiently covers the intent of the test.

To fix the cross-platform compatibility issues I've replaced ps -p with kill -0 it was the only test I could find that had consistent behaviour across platforms to test a pid exists. kill -0 can have some surprising behaviour depending on the users permission to the pid... but I think given all of the pids we are dealing with are children of the script it should be fine.

I've been attempting to add the missing tests, but I just seem to keep running into the error:

2023-06-19 06:37:02 UTC | ✗ Concurrency waits when the queue is full -- | -- | 2023-06-19 06:37:04 UTC | (from function `unstub' in file /usr/lib/bats/bats-mock/stub.bash, line 70, | 2023-06-19 06:37:04 UTC | in test file tests/pre-exit-success.bats, line 89) | 2023-06-19 06:37:04 UTC | `unstub curl' failed

I'm not very familiar with bats, it seems as though I'm likely configuring it wrong but bats-mock is hiding the underlying error?

Alrighty, I think I have a working implementation! Unfortunately it looks like I need to check for "error, continuing" rather than the timeout message specifically... but afaict that might be unavoidable as messages outside of the scope of the command group seem to get lost.

Regardless, it still tests that the first group is failing and that we return only one success, which might be close enough for our purposes here.

Actually, my tests suggest that nothing actually happens if the process is killed so you can not check for either output:

Uploading '3' files matching '**/*/junit-*.xml' Uploading './tests/fixtures/junit-1.xml'... Uploading './tests/fixtures/junit-2.xml'... sleeping for data=@"./tests/fixtures/junit-1.xml" sleeping for data=@"./tests/fixtures/junit-2.xml" Upload '2678' has been running for more than '3' seconds, killing it Upload '2679' has been running for more than '3' seconds, killing it Uploading './tests/fixtures/junit-3.xml'... curl success data=@"./tests/fixtures/junit-3.xml"

But I believe it is testing that the process does not break explosively :)

Huh, weird, because I was seeing:

Uploading '3' files matching '**/*/junit-*.xml' Uploading './tests/fixtures/junit-1.xml'... Uploading './tests/fixtures/junit-3.xml'... sleeping for data=@"./tests/fixtures/junit-1.xml" curl success data=@"./tests/fixtures/junit-3.xml" /plugin/hooks/pre-exit: line 180: kill: (83) - No such process Uploading './tests/fixtures/junit-2.xml'... Error uploading, will continue

I wonder why we're seeing different output 🤔. Probably not a big deal, interesting though.

Edit: The test command I am using is docker run --rm -ti -e CURL_STUB_DEBUG=/dev/tty -v $PWD:/plugin buildkite/plugin-tester:v4.1.0 bats tests/pre-exit-success.bats -f 'Concurrency gracefully handles command-group timeout'

When I run it, the first batch is 1 and 2, the second batch is just 3 so the first two processes get killed. In your run, the first batch was 1 and 3, only one of those gets killed so the second one generates the No such process line (which points to the fact that the command needs stderr redirection to /dev/null It may be related to how files are laid out in the filesystem for the order in which find reports them back.

Most interestingly: the last one fails because of the curl stub only having one line (in your scenario it would need two) and that causes the Error uploading line, otherwise you should be seeing sleeping for on the 2 file as well. Even if the test succeeded for the output, it would fail when unstubbing curl :(

Instead of trying to figure out exactly how to work around this non-determinism, I would suggest you add a | sort to the FIND_CMD[@] call in line 156 so that the order of files is always the same no matter what.

Yep, that fixed that issue :)

tests/pre-exit-success.bats

hooks/pre-exit

fix: Concurrency tests

Co-authored-by: Matías Bellone <toote@users.noreply.github.com>

fix: Cross-platform compatability

* fix: Use timeout var instead of magic numbers for concurrency

* fix: Add missing concurrency tests

hooks/pre-exit

fix: Various test fixes and new timeout test

toote · 2023-06-29T14:10:11Z

hooks/pre-exit

+        if [[ "$ANNOTATE" != "false" ]]; then
+          save-report-url "${CURL_RESP_FILE}"
+        fi


I just realized there is a bug here: the file is shared among all upload calls so there is a race condition as to which URL is saved for annotations and which are not. There are two alternatives to fix this:

1- make a note/check about annotation not compatible with concurrent uploads
2- change the upload function to accept another parameter that defines the filename to save the file that you can then pass through to the save-report-url

Whichever you prefer I'll be OK with

Ah, good catch. I'll investigate this further.

I think this is fixed now, I use mktemp to generate a temporary file in the command group if CURL_RESP_FILE is unset.

js-murph · 2023-06-30T01:10:14Z

Hmmm it seems there is a failing test, but I'm unable to reproduce. The tests run successfully both locally for me and on our buildkite cluster.

toote · 2023-07-01T03:25:03Z

Hmmm it seems there is a failing test, but I'm unable to reproduce. The tests run successfully both locally for me and on our buildkite cluster.

~~Same thing happens to me in my local environment.~~ (see note below)

I've been thinking what can be done about it without actually committing some debugging information and the best I could come up with wold be to change that test's assertion (the one that is failing) to one that is more standard so that more information is printed when it fails. In particular, do something like:

  assert_line --regexp '^curl success .*junit-1.xml"'
  assert_line --regexp '^curl success .*junit-2.xml"'
  assert_line --regexp '^curl success .*junit-3.xml"'

While this may not cause the test to succeed when it wasn't, it will probably show more information (including the whole output) when it does fail.

Based on the explanation below, I believe we need to pick one of two bad options:
1- skip the tests by adding a skip 'bats-mock does not correctly support concurrent exeuctions and makes this test flaky' to the first line of the test

This would leave the tests ready to be run when the underlying bats-mock library is fixed (probably with some modifications) but means that functionality may break if the tests are never run.

2- ask someone to re-run the test until the race condition is not triggered

This takes advantage of all the hard-work you've done and actually tests this feauter. On the other hand, not only it puts additional burden on the staff when the test fails, but also reduces the reliability of tests as an indicator that something is wrong with changes for every PR and merge going forward.

@pzeballos: do you (or anyone else in the team) have any opinion on the matter?

Note

While writing the suggestion and making sure that it would work I was able to duplicate the issue. There are actually two scenarios in which the test can fail, but both are due to a race condition in bats-mock not correctly supporting concurrent execution of stubs.

In the first scenario (which is easier to explain), the 3rd scenario runs after the first and second one do concurrently. Because there is a single stub for curl, that call fails and causes the whole test to fail in the unstub curl call:

pre-exit-success.bats
 bats-mock(curl): got curl -X POST --silent --show-error --max-time 30 --form format=junit --form data=@"./tests/fixtures/junit-2.xml" --form run_env[CI]=buildkite --form run_env[key]=an-id --form run_env[url]=https://url.com/ --form run_env[branch]=a-branch --form run_env[commit_sha]=a-commit --form run_env[number]=123 --form run_env[job_id]=321 --form run_env[message]=A message --form run_env[collector]=test-collector-buildkite-plugin https://analytics-api.buildkite.com/v1/uploads -H Authorization: Token token="a-secret-analytics-token"
bats-mock(curl): arguments [31] = '-X' 'POST' '--silent' '--show-error' '--max-time' '30' '--form' 'format=junit' '--form' 'data=@\"./tests/fixtures/junit-2.xml\"' '--form' 'run_env\[CI\]=buildkite' '--form' 'run_env\[key\]=an-id' '--form' 'run_env\[url\]=https://url.com/' '--form' 'run_env\[branch\]=a-branch' '--form' 'run_env\[commit_sha\]=a-commit' '--form' 'run_env\[number\]=123' '--form' 'run_env\[job_id\]=321' '--form' 'run_env\[message\]=A\ message' '--form' 'run_env\[collector\]=test-collector-buildkite-plugin' 'https://analytics-api.buildkite.com/v1/uploads' '-H' 'Authorization:\ Token\ token=\"a-secret-analytics-token\"' 
bats-mock(curl): got curl -X POST --silent --show-error --max-time 30 --form format=junit --form data=@"./tests/fixtures/junit-1.xml" --form run_env[CI]=buildkite --form run_env[key]=an-id --form run_env[url]=https://url.com/ --form run_env[branch]=a-branch --form run_env[commit_sha]=a-commit --form run_env[number]=123 --form run_env[job_id]=321 --form run_env[message]=A message --form run_env[collector]=test-collector-buildkite-plugin https://analytics-api.buildkite.com/v1/uploads -H Authorization: Token token="a-secret-analytics-token"
bats-mock(curl): patterns  [31] = '-X' 'POST' '--silent' '--show-error' '--max-time' '30' '--form' 'format=junit' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '\*' '-H' '\*' 
bats-mock(curl): arguments [31] = '-X' 'POST' '--silent' '--show-error' '--max-time' '30' '--form' 'format=junit' '--form' 'data=@\"./tests/fixtures/junit-1.xml\"' '--form' 'run_env\[CI\]=buildkite' '--form' 'run_env\[key\]=an-id' '--form' 'run_env\[url\]=https://url.com/' '--form' 'run_env\[branch\]=a-branch' '--form' 'run_env\[commit_sha\]=a-commit' '--form' 'run_env\[number\]=123' '--form' 'run_env\[job_id\]=321' '--form' 'run_env\[message\]=A\ message' '--form' 'run_env\[collector\]=test-collector-buildkite-plugin' 'https://analytics-api.buildkite.com/v1/uploads' '-H' 'Authorization:\ Token\ token=\"a-secret-analytics-token\"' 
bats-mock(curl): running echo "curl success ${10}"
bats-mock(curl): command result was 0
bats-mock(curl): result 0
bats-mock(curl): patterns  [31] = '-X' 'POST' '--silent' '--show-error' '--max-time' '30' '--form' 'format=junit' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '\*' '-H' '\*' 
bats-mock(curl): running echo "curl success ${10}"
bats-mock(curl): command result was 0
bats-mock(curl): result 0
bats-mock(curl): got curl -X POST --silent --show-error --max-time 30 --form format=junit --form data=@"./tests/fixtures/junit-3.xml" --form run_env[CI]=buildkite --form run_env[key]=an-id --form run_env[url]=https://url.com/ --form run_env[branch]=a-branch --form run_env[commit_sha]=a-commit --form run_env[number]=123 --form run_env[job_id]=321 --form run_env[message]=A message --form run_env[collector]=test-collector-buildkite-plugin https://analytics-api.buildkite.com/v1/uploads -H Authorization: Token token="a-secret-analytics-token"
bats-mock(curl): no plan row found
bats-mock(curl): result 1
bats-mock(curl): unstubbing
 ✗ Uploads multiple files concurrently does not break basic functionality
   (from function `unstub' in file /usr/lib/bats/bats-mock/stub.bash, line 70,
    in test file tests/pre-exit-success.bats, line 69)
     `unstub curl' failed

1 test, 1 failure

In the other scenario, which I believe is what happened in the run of your commit, the 3rd stub is executed after the first one finished but before the second one

pre-exit-success.bats
 bats-mock(curl): got curl -X POST --silent --show-error --max-time 30 --form format=junit --form data=@"./tests/fixtures/junit-1.xml" --form run_env[CI]=buildkite --form run_env[key]=an-id --form run_env[url]=https://url.com/ --form run_env[branch]=a-branch --form run_env[commit_sha]=a-commit --form run_env[number]=123 --form run_env[job_id]=321 --form run_env[message]=A message --form run_env[collector]=test-collector-buildkite-plugin https://analytics-api.buildkite.com/v1/uploads -H Authorization: Token token="a-secret-analytics-token"
bats-mock(curl): arguments [31] = '-X' 'POST' '--silent' '--show-error' '--max-time' '30' '--form' 'format=junit' '--form' 'data=@\"./tests/fixtures/junit-1.xml\"' '--form' 'run_env\[CI\]=buildkite' '--form' 'run_env\[key\]=an-id' '--form' 'run_env\[url\]=https://url.com/' '--form' 'run_env\[branch\]=a-branch' '--form' 'run_env\[commit_sha\]=a-commit' '--form' 'run_env\[number\]=123' '--form' 'run_env\[job_id\]=321' '--form' 'run_env\[message\]=A\ message' '--form' 'run_env\[collector\]=test-collector-buildkite-plugin' 'https://analytics-api.buildkite.com/v1/uploads' '-H' 'Authorization:\ Token\ token=\"a-secret-analytics-token\"' 
bats-mock(curl): got curl -X POST --silent --show-error --max-time 30 --form format=junit --form data=@"./tests/fixtures/junit-2.xml" --form run_env[CI]=buildkite --form run_env[key]=an-id --form run_env[url]=https://url.com/ --form run_env[branch]=a-branch --form run_env[commit_sha]=a-commit --form run_env[number]=123 --form run_env[job_id]=321 --form run_env[message]=A message --form run_env[collector]=test-collector-buildkite-plugin https://analytics-api.buildkite.com/v1/uploads -H Authorization: Token token="a-secret-analytics-token"
bats-mock(curl): patterns  [31] = '-X' 'POST' '--silent' '--show-error' '--max-time' '30' '--form' 'format=junit' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '\*' '-H' '\*' 
bats-mock(curl): arguments [31] = '-X' 'POST' '--silent' '--show-error' '--max-time' '30' '--form' 'format=junit' '--form' 'data=@\"./tests/fixtures/junit-2.xml\"' '--form' 'run_env\[CI\]=buildkite' '--form' 'run_env\[key\]=an-id' '--form' 'run_env\[url\]=https://url.com/' '--form' 'run_env\[branch\]=a-branch' '--form' 'run_env\[commit_sha\]=a-commit' '--form' 'run_env\[number\]=123' '--form' 'run_env\[job_id\]=321' '--form' 'run_env\[message\]=A\ message' '--form' 'run_env\[collector\]=test-collector-buildkite-plugin' 'https://analytics-api.buildkite.com/v1/uploads' '-H' 'Authorization:\ Token\ token=\"a-secret-analytics-token\"' 
bats-mock(curl): running echo "curl success ${10}"
bats-mock(curl): command result was 0
bats-mock(curl): result 0
bats-mock(curl): got curl -X POST --silent --show-error --max-time 30 --form format=junit --form data=@"./tests/fixtures/junit-3.xml" --form run_env[CI]=buildkite --form run_env[key]=an-id --form run_env[url]=https://url.com/ --form run_env[branch]=a-branch --form run_env[commit_sha]=a-commit --form run_env[number]=123 --form run_env[job_id]=321 --form run_env[message]=A message --form run_env[collector]=test-collector-buildkite-plugin https://analytics-api.buildkite.com/v1/uploads -H Authorization: Token token="a-secret-analytics-token"
bats-mock(curl): no plan row found
bats-mock(curl): result 1
bats-mock(curl): patterns  [31] = '-X' 'POST' '--silent' '--show-error' '--max-time' '30' '--form' 'format=junit' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '--form' '\*' '\*' '-H' '\*' 
bats-mock(curl): running echo "curl success ${10}"
bats-mock(curl): command result was 0
bats-mock(curl): result 0
bats-mock(curl): unstubbing
 ✗ Uploads multiple files concurrently does not break basic functionality
   (from function `assert_line' in file /usr/lib/bats/bats-assert/src/assert_line.bash, line 219,
    in test file tests/pre-exit-success.bats, line 77)
     `assert_line --regexp '^curl success .*junit-3.xml"'' failed
   
   -- no output line matches regular expression --
   regexp : ^curl success .*junit-3.xml"
   output (7 lines):
     Uploading '3' files matching '**/*/junit-*.xml'
     Uploading './tests/fixtures/junit-1.xml'...
     Uploading './tests/fixtures/junit-2.xml'...
     Uploading './tests/fixtures/junit-3.xml'...
     curl success data=@"./tests/fixtures/junit-1.xml"
     Error uploading, will continue
     curl success data=@"./tests/fixtures/junit-2.xml"
   --
   

1 test, 1 failure

pzeballos · 2023-07-01T16:24:50Z

😳 this is definitely a tricky case! bats 😅
I'll go with option (1): "skip the tests by adding a skip 'bats-mock does not correctly support concurrent executions and makes this test flaky' to the first line of the test"

And we'll see what we can do to bring the tests back (I manually retired the step until it passed)

Thank you very much @js-murph for all your efforts and the amazing contribution you are making to the plugin 🤗 Do you think you can do that last change (add a skip with a message)? 🙏🏻 and I believe that we'll be ready to finally merge it 🤞🏻

js-murph · 2023-07-02T23:56:45Z

Thank you both! I've pushed a skip on the concurrency does not break basic functionality test. Appreciate the explanation and helping diagnose the problem! :)

pzeballos · 2023-07-03T16:13:57Z

Thank you very much @js-murph! 🤗

toote

👏

js-murph added 20 commits June 1, 2023 13:08

feat: Upload concurrency

03b6952

fix: Add debug logging

a5f5453

fix: Use command grouping so output is captured

565ea3b

fix: intentation

c8aea0b

chore: Debug line

5cab305

chore: Remove tracing

bde27fc

chore: add tracing, remove subshell

1f8e739

chore: change increment operation

52de3bf

chore: Re-add subshell

032a0c3

chore: Remove tracing

0d4eaef

chore: More debug logs

3ea88ad

fix: Store pids instead of using counters

56add33

fix: accidental char

0509e90

fix: Remove pids no longer running

5936cb8

chore: Change default concurrency back to 1

0090dfe

docs: Add docs & tests

980cc62

chore: Test tests

26ce593

chore: Test tests

9e58efc

chore: Test tests

712af76

chore: decrease verbosity

e6d475d

mcncl requested a review from toote June 16, 2023 01:10

toote reviewed Jun 16, 2023

View reviewed changes

tests/pre-exit-success.bats Outdated Show resolved Hide resolved

Update tests/pre-exit-success.bats

47302c3

Co-authored-by: Matías Bellone <toote@users.noreply.github.com>

toote requested changes Jun 16, 2023

View reviewed changes

js-murph and others added 6 commits June 19, 2023 14:28

fix: Concurrency tests

d4b4e21

Merge pull request #1 from js-murph/johnm/fix-tests

7b96c14

fix: Concurrency tests

Update tests/pre-exit-success.bats

0e15880

Co-authored-by: Matías Bellone <toote@users.noreply.github.com>

fix: BSD compatability

14fce48

Merge pull request #2 from js-murph/johnm/fix-bsd-compatability

bc84ece

fix: Cross-platform compatability

fix: Use timeout var instead of magic numbers for concurrency

57b7297

* fix: Use timeout var instead of magic numbers for concurrency

fix: Add missing concurrency tests (#4)

139f14e

* fix: Add missing concurrency tests

toote reviewed Jun 20, 2023

View reviewed changes

hooks/pre-exit Outdated Show resolved Hide resolved

js-murph and others added 3 commits June 20, 2023 16:21

fix: Various test fixes and new timeout test (#5)

f8ca0cf

fix: Various test fixes and new timeout test

chore: Remove non-functional timeout test

a85a3ae

fix: Re-add working concurrency timeout test

b9aded8

toote requested changes Jun 29, 2023

View reviewed changes

fix: timeout test consistency and response race condition

d1f196b

This was referenced Jul 1, 2023

Race condition with concurrent executions buildkite-plugins/bats-mock#14

Open

killed stub would not record execution buildkite-plugins/bats-mock#16

Open

fix: skip multiple files concurrently

dbb5716

toote approved these changes Jul 3, 2023

View reviewed changes

pzeballos merged commit 8b7e6e2 into buildkite-plugins:main Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Concurrent file uploads #53

feat: Concurrent file uploads #53

js-murph commented Jun 1, 2023

toote left a comment

toote Jun 16, 2023

toote Jun 16, 2023

js-murph Jun 19, 2023

js-murph Jun 19, 2023 •

edited

Loading

js-murph Jun 19, 2023 •

edited

Loading

js-murph Jun 29, 2023

toote Jun 29, 2023

js-murph Jun 29, 2023 •

edited

Loading

toote Jun 29, 2023 •

edited

Loading

js-murph Jun 29, 2023

toote Jun 29, 2023

js-murph Jun 29, 2023

js-murph Jun 30, 2023

js-murph Jun 30, 2023

js-murph commented Jun 30, 2023 •

edited

Loading

toote commented Jul 1, 2023 •

edited

Loading

pzeballos commented Jul 1, 2023 •

edited

Loading

js-murph commented Jul 2, 2023

pzeballos commented Jul 3, 2023

toote left a comment

feat: Concurrent file uploads #53

feat: Concurrent file uploads #53

Conversation

js-murph commented Jun 1, 2023

What?

Why?

toote left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

js-murph Jun 19, 2023 • edited Loading

Choose a reason for hiding this comment

js-murph Jun 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

js-murph Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

toote Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

js-murph commented Jun 30, 2023 • edited Loading

toote commented Jul 1, 2023 • edited Loading

Note

pzeballos commented Jul 1, 2023 • edited Loading

js-murph commented Jul 2, 2023

pzeballos commented Jul 3, 2023

toote left a comment

Choose a reason for hiding this comment

js-murph Jun 19, 2023 •

edited

Loading

js-murph Jun 19, 2023 •

edited

Loading

js-murph Jun 29, 2023 •

edited

Loading

toote Jun 29, 2023 •

edited

Loading

js-murph commented Jun 30, 2023 •

edited

Loading

toote commented Jul 1, 2023 •

edited

Loading

pzeballos commented Jul 1, 2023 •

edited

Loading