Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multiple k6-browser instances to run concurrent tests against a single browser #848

Merged
merged 17 commits into from
Apr 6, 2023

Conversation

inancgumus
Copy link
Member

@inancgumus inancgumus commented Apr 5, 2023

Problem

When we run multi-VU/instance tests, one of the test runs can close a page of its own. Since each test run attaches to all the browser pages (this is another area we might want to discuss), this sometimes creates a race between the test runs. Here is a step-by-step explanation.

There are two instances and different sessions. However, both instances attach/detach from the same pages since we run in the same browser. instance2 panics while trying to send CDP messages to a closed web socket connection since it detaches from it (because instance1 closes it) and tries to attach to it concurrently (race).

Instance Notes Instance Attachments Browser pages
🔴 i1 newPage page1
🔴 i1 attach to page 1 page1 page1
🟢 i2 newPage page2 page2, page1
🔴 i1 attach to page 2 page2, page1 (!) page2, page1
🟢 i2 attach to page 1
🚨 ...note that this operation continues concurrently...
page2, page1... page2, page1
🔴 i1 page1.Close() page2, page1 page2
🔴 i1 detach from page 1
closes page1 session for instance1. these sessions are not shared with other instances. they are specific to this instance. however, the pages on the browser are shared between instances.
page2 page2
🟢 i2 detach from page 1
closes the page1 (targetID) session for instance2.
page2 page2
🟢 i2 attach to page 1 still continues...
🚨panic: session does not exist.
There was a panic because instance2.attach to page 1 operation was going on before instance1 closed page1.
page2, page1... page2, page1
sequenceDiagram
    participant instance1
    participant instance2
    participant Browser pages
    instance1->>Browser pages: newPage
    instance1->>page1: attach to page 1
    instance2->>Browser pages: newPage
    instance1->>page2: attach to page 2
    instance2->>page1: attach to page 1: did not end yet.
    instance1->>page1: page1.Close()
    instance1->>+page1: detach from page 1
    instance2->>+page1: detach from page 1
    instance2->>+page1: attach to page 1 🚨 panic
Loading

Fix

This check allows the test run to continue instead of panicking because of another test run's page detachments. The fix allows running the k6 browser in high concurrency with multiple VUs and instances. Note that it is still possible to get errors since we need to correctly handle the CDP messages while sending and receiving them (the order of them). This can cause timeout and other sorts of errors.

Testing

Here's the test script I used to test this:

import { check } from 'k6';
import { chromium } from 'k6/x/browser';

export default async function () {
  const URL = "ws://127.0.0.1:9222/devtools/browser/035490d7-d3d2-4426-9e38-450adcc7cd74";
  
  console.log('Connecting to browser... VU:', __VU, 'ITER:', __ITER);  
  const browser = chromium.connect(URL);

  console.log('Creating new context... VU:', __VU, 'ITER:', __ITER);
  const context = browser.newContext();

  console.log('Opening new page... VU:', __VU, 'ITER:', __ITER);
  const page = context.newPage();

  console.log('Navigating to website... VU:', __VU, 'ITER:', __ITER);
  await page.goto('https://test.k6.io/', { waitUntil: 'networkidle' });

  console.log('Getting page title... VU:', __VU, 'ITER:', __ITER);
  console.log('page title:'+ page.title(), 'VU:', __VU, 'ITER:', __ITER);
  check(page, {
     'title': p => p.title() == 'Demo website for load testing',
  });
  
  console.log('Closing page... VU:', __VU, 'ITER:', __ITER);
  page.close()
  console.log('Closing context... VU:', __VU, 'ITER:', __ITER);
  context.close()
  console.log('Disconnecting from browser... VU:', __VU, 'ITER:', __ITER);
  browser.close()

  console.log('❌ connected : ', isConnected(browser));
  console.log('Test completed.');
}

function isConnected(browser) {
  return browser.isConnected() ? '✅' : '❌';
}

Here's the bash script (multik6b.sh) that can run multiple tests (a courtesy of @ankur22 🙇):

#!/bin/bash

NUM_K6_INSTANCES=$1
TEST_ITERATION=$2
TEST_FILE=$3
LOG_TRACE='info'

if [[ ! -f .last_run ]];
then
    echo "Last run    : never"
else
    echo "Last run    : $(<.last_run)"
fi
echo "Current time: $(date '+%Y-%m-%d %H:%M:%S')"
echo

# # Delete log files from previous runs
rm -f log_*.log

# Check if any files have changed since the last run
if [[ ! -f .last_run || $(fd --type f --changed-within "$(<.last_run)" --exclude 'k6' | wc -l) -gt 0 ]]; then
    echo "✅ Files have changed since the last run, rebuilding k6-browser"
    echo "------------------------------------------------------"
    # Rebuild k6-browser if any files have changed
    xk6 build --with github.com/grafana/xk6-browser=.
fi

date '+%Y-%m-%d %H:%M:%S' > .last_run

echo
echo "------------------------------------------------------"
echo

run_test(){
    index=$1
    if XK6_BROWSER_LOG=$LOG_TRACE ./k6 run -q --vus 1 -i $TEST_ITERATION $TEST_FILE > log_$index.log 2>&1; then
        echo "✅ test run $index succeeded"
    else
        echo "❌ test run $index failed"
    fi
}

i=1
while [[ $i -le $NUM_K6_INSTANCES ]]
do
    run_test $i &
    ((i = i + 1))
done

wait
echo .
echo "All instances exited"

Here's an example command for testing:

$ ./multik6b.sh 10 2 script.js
Last run    : 2023-04-05 10:59:40
Current time: 2023-04-05 11:00:02

------------------------------------------------------

✅ test run 5 succeeded
✅ test run 6 succeeded
✅ test run 7 succeeded
✅ test run 2 succeeded
✅ test run 4 succeeded
✅ test run 3 succeeded
✅ test run 1 succeeded
✅ test run 10 succeeded
✅ test run 8 succeeded
✅ test run 9 succeeded

The PR also refactors the page attachment logic for better maintenance and readability—also fixes the linter warnings. This helped me find the problem since it made it easier to understand the code. I didn't prefer to put it in another PR but rather as a commit here. I believe "make it better than you found it" is a nice approach for reducing technical debt :)

@inancgumus inancgumus force-pushed the fix/k6c1096-multi-vu-sessions branch from 0bd5c65 to 86b9217 Compare April 5, 2023 07:44
@inancgumus inancgumus added bug Something isn't working refactor remote remote browser related labels Apr 5, 2023
@inancgumus inancgumus self-assigned this Apr 5, 2023
@inancgumus inancgumus added this to the v0.9.0 milestone Apr 5, 2023
@inancgumus inancgumus marked this pull request as ready for review April 5, 2023 08:13
@inancgumus inancgumus requested review from ankur22 and ka3de April 5, 2023 08:13
@inancgumus inancgumus changed the title Fix/k6c1096 multi vu sessions Fix multiple k6 instances can connect to one browser instance and run tests concurrently Apr 5, 2023
Copy link
Collaborator

@ankur22 ankur22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great bit of debugging to catch this and resolve it! Nice work 👏

I'm wondering if you could help me a bit by splitting the last commit into two, one for the new changes to onAttachedToTarget and another for the splitting/refactoring the method into smaller methods?

common/session.go Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser.go Show resolved Hide resolved
This makes it easy to check if a session is closed from a non-select
statement, like a switch or if.
When we run multi-VU/instance tests, one of the test runs can close a
page of its own. Since each test run attaches to all the browser pages
(this is another area we might want to discuss), this sometimes
creates a race between the test runs. This check allows the test run
to continue instead of panicking because of another test run's page
detachments.

This fix allows running the k6 browser in high concurrency with
multiple VUs and instances. Note that it is still possible to get
errors since we need to correctly handle the CDP messages while
sending and receiving them (the order of them). This can cause
timeout and other sorts of errors.

Explanation:
There are two instances and different sessions. However, both
instances attach/detach from the same pages since we run in the same browser.

instance1 ---> newPage                    page1
  attaches to: page1
instance2
  attaches to: page1
instance2 ---> newPage                    page1, page2
  attaches to: page2
instance1
  attaches to: page2
instance1 --> page1.Close()               page2
instance1 <-- detachedFromTarget(page1)
  closes page1 session.

This is the racy part that this PR fixes:

instance2 <-- detachedFromTarget(page1)
  closes page1 session.
instance2 ---> attachToTarget(page1)
  panic: session does not exist.

instance2 panics while trying to send CDP messages to a closed web socket
connection since it detaches from it and tries to attach to it
concurrently.
This is for connecting to an existing browser over a WebSocket URL.
This will panic if the fix in this PR did not get applied. Since the
panic occurs in a different routine, we can't catch the panic, and leave
this test as naked (without using require.Panics).
@inancgumus inancgumus force-pushed the fix/k6c1096-multi-vu-sessions branch from 86b9217 to 406d2c6 Compare April 5, 2023 12:20
@inancgumus inancgumus requested a review from ankur22 April 5, 2023 12:24
@inancgumus inancgumus added the team/k6browser To distinguish the issue on project boards. label Apr 5, 2023
Copy link
Collaborator

@ankur22 ankur22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 Thanks for splitting the changes into smaller commits, it was a lot easier to follow. I only have some very minor suggestions.

LGTM 🎉

common/browser.go Outdated Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser.go Show resolved Hide resolved
Also move one logging out of locking.

Co-authored-by: ankur22 <ankur.agarwal@grafana.com>
Copy link
Collaborator

@ka3de ka3de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @inancgumus ! 🎉
LGTM.

@inancgumus inancgumus merged commit 49b79d4 into main Apr 6, 2023
@inancgumus inancgumus deleted the fix/k6c1096-multi-vu-sessions branch April 6, 2023 07:04
@inancgumus inancgumus changed the title Fix multiple k6 instances can connect to one browser instance and run tests concurrently Fix allow multiple k6 instances to connect to one browser to run concurrent tests Apr 6, 2023
inancgumus added a commit that referenced this pull request Apr 6, 2023
Rationale:

I suggested this earlier on to detect nil sessions. But now, this
warning is outdated because when users running multiple instance/VU
tests, they will see dozens/hundreds lines of warnings.

The core reason we receive a lot of these warnings is:

We need to correctly handle the CDP messages while sending and receiving
them (the order of them).
#848
inancgumus added a commit that referenced this pull request Apr 6, 2023
Rationale:

I suggested this earlier on to detect nil sessions. But now, this
warning is outdated because when users running multiple instance/VU
tests, they will see dozens/hundreds lines of warnings.

The core reason we receive a lot of these warnings is:

We need to correctly handle the CDP messages while sending and receiving
them (the order of them).
#848
@inancgumus inancgumus changed the title Fix allow multiple k6 instances to connect to one browser to run concurrent tests Allow multiple k6 instances to connect to one browser to run concurrent tests Apr 6, 2023
inancgumus added a commit that referenced this pull request May 8, 2023
Rationale:

I suggested this earlier on to detect nil sessions. But now, this
warning is outdated because when users running multiple instance/VU
tests, they will see dozens/hundreds lines of warnings.

The core reason we receive a lot of these warnings is:

We need to correctly handle the CDP messages while sending and receiving
them (the order of them).
#848
@inancgumus inancgumus changed the title Allow multiple k6 instances to connect to one browser to run concurrent tests Enable multiple k6-browser instances to run concurrent tests against a single browser May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working refactor remote remote browser related team/k6browser To distinguish the issue on project boards.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants