Enable multiple k6-browser instances to run concurrent tests against a single browser #848

inancgumus · 2023-04-05T07:37:28Z

Problem

When we run multi-VU/instance tests, one of the test runs can close a page of its own. Since each test run attaches to all the browser pages (this is another area we might want to discuss), this sometimes creates a race between the test runs. Here is a step-by-step explanation.

There are two instances and different sessions. However, both instances attach/detach from the same pages since we run in the same browser. instance2 panics while trying to send CDP messages to a closed web socket connection since it detaches from it (because instance1 closes it) and tries to attach to it concurrently (race).

Instance	Notes	Instance Attachments	Browser pages
🔴 i1	newPage		page1
🔴 i1	attach to page 1	page1	page1
🟢 i2	newPage	page2	page2, page1
🔴 i1	attach to page 2	page2, page1 (!)	page2, page1
🟢 i2	attach to page 1 🚨 ...note that this operation continues concurrently...	page2, page1...	page2, page1
🔴 i1	page1.Close()	page2, page1	page2
🔴 i1	detach from page 1 closes page1 session for instance1. these sessions are not shared with other instances. they are specific to this instance. however, the pages on the browser are shared between instances.	page2	page2
🟢 i2	detach from page 1 closes the page1 (targetID) session for instance2.	page2	page2
🟢 i2	attach to page 1 still continues... 🚨panic: session does not exist. There was a panic because instance2.attach to page 1 operation was going on before instance1 closed page1.	page2, page1...	page2, page1

sequenceDiagram
    participant instance1
    participant instance2
    participant Browser pages
    instance1->>Browser pages: newPage
    instance1->>page1: attach to page 1
    instance2->>Browser pages: newPage
    instance1->>page2: attach to page 2
    instance2->>page1: attach to page 1: did not end yet.
    instance1->>page1: page1.Close()
    instance1->>+page1: detach from page 1
    instance2->>+page1: detach from page 1
    instance2->>+page1: attach to page 1 🚨 panic

Fix

This check allows the test run to continue instead of panicking because of another test run's page detachments. The fix allows running the k6 browser in high concurrency with multiple VUs and instances. Note that it is still possible to get errors since we need to correctly handle the CDP messages while sending and receiving them (the order of them). This can cause timeout and other sorts of errors.

Testing

Here's the test script I used to test this:

import { check } from 'k6';
import { chromium } from 'k6/x/browser';

export default async function () {
  const URL = "ws://127.0.0.1:9222/devtools/browser/035490d7-d3d2-4426-9e38-450adcc7cd74";
  
  console.log('Connecting to browser... VU:', __VU, 'ITER:', __ITER);  
  const browser = chromium.connect(URL);

  console.log('Creating new context... VU:', __VU, 'ITER:', __ITER);
  const context = browser.newContext();

  console.log('Opening new page... VU:', __VU, 'ITER:', __ITER);
  const page = context.newPage();

  console.log('Navigating to website... VU:', __VU, 'ITER:', __ITER);
  await page.goto('https://test.k6.io/', { waitUntil: 'networkidle' });

  console.log('Getting page title... VU:', __VU, 'ITER:', __ITER);
  console.log('page title:'+ page.title(), 'VU:', __VU, 'ITER:', __ITER);
  check(page, {
     'title': p => p.title() == 'Demo website for load testing',
  });
  
  console.log('Closing page... VU:', __VU, 'ITER:', __ITER);
  page.close()
  console.log('Closing context... VU:', __VU, 'ITER:', __ITER);
  context.close()
  console.log('Disconnecting from browser... VU:', __VU, 'ITER:', __ITER);
  browser.close()

  console.log('❌ connected : ', isConnected(browser));
  console.log('Test completed.');
}

function isConnected(browser) {
  return browser.isConnected() ? '✅' : '❌';
}

Here's the bash script (multik6b.sh) that can run multiple tests (a courtesy of @ankur22 🙇):

#!/bin/bash

NUM_K6_INSTANCES=$1
TEST_ITERATION=$2
TEST_FILE=$3
LOG_TRACE='info'

if [[ ! -f .last_run ]];
then
    echo "Last run    : never"
else
    echo "Last run    : $(<.last_run)"
fi
echo "Current time: $(date '+%Y-%m-%d %H:%M:%S')"
echo

# # Delete log files from previous runs
rm -f log_*.log

# Check if any files have changed since the last run
if [[ ! -f .last_run || $(fd --type f --changed-within "$(<.last_run)" --exclude 'k6' | wc -l) -gt 0 ]]; then
    echo "✅ Files have changed since the last run, rebuilding k6-browser"
    echo "------------------------------------------------------"
    # Rebuild k6-browser if any files have changed
    xk6 build --with github.com/grafana/xk6-browser=.
fi

date '+%Y-%m-%d %H:%M:%S' > .last_run

echo
echo "------------------------------------------------------"
echo

run_test(){
    index=$1
    if XK6_BROWSER_LOG=$LOG_TRACE ./k6 run -q --vus 1 -i $TEST_ITERATION $TEST_FILE > log_$index.log 2>&1; then
        echo "✅ test run $index succeeded"
    else
        echo "❌ test run $index failed"
    fi
}

i=1
while [[ $i -le $NUM_K6_INSTANCES ]]
do
    run_test $i &
    ((i = i + 1))
done

wait
echo .
echo "All instances exited"

Here's an example command for testing:

$ ./multik6b.sh 10 2 script.js
Last run    : 2023-04-05 10:59:40
Current time: 2023-04-05 11:00:02

------------------------------------------------------

✅ test run 5 succeeded
✅ test run 6 succeeded
✅ test run 7 succeeded
✅ test run 2 succeeded
✅ test run 4 succeeded
✅ test run 3 succeeded
✅ test run 1 succeeded
✅ test run 10 succeeded
✅ test run 8 succeeded
✅ test run 9 succeeded

The PR also refactors the page attachment logic for better maintenance and readability—also fixes the linter warnings. This helped me find the problem since it made it easier to understand the code. I didn't prefer to put it in another PR but rather as a commit here. I believe "make it better than you found it" is a nice approach for reducing technical debt :)

ankur22

Great bit of debugging to catch this and resolve it! Nice work 👏

I'm wondering if you could help me a bit by splitting the last commit into two, one for the new changes to onAttachedToTarget and another for the splitting/refactoring the method into smaller methods?

common/session.go

common/browser.go

This makes it easy to check if a session is closed from a non-select statement, like a switch or if.

When we run multi-VU/instance tests, one of the test runs can close a page of its own. Since each test run attaches to all the browser pages (this is another area we might want to discuss), this sometimes creates a race between the test runs. This check allows the test run to continue instead of panicking because of another test run's page detachments. This fix allows running the k6 browser in high concurrency with multiple VUs and instances. Note that it is still possible to get errors since we need to correctly handle the CDP messages while sending and receiving them (the order of them). This can cause timeout and other sorts of errors. Explanation: There are two instances and different sessions. However, both instances attach/detach from the same pages since we run in the same browser. instance1 ---> newPage page1 attaches to: page1 instance2 attaches to: page1 instance2 ---> newPage page1, page2 attaches to: page2 instance1 attaches to: page2 instance1 --> page1.Close() page2 instance1 <-- detachedFromTarget(page1) closes page1 session. This is the racy part that this PR fixes: instance2 <-- detachedFromTarget(page1) closes page1 session. instance2 ---> attachToTarget(page1) panic: session does not exist. instance2 panics while trying to send CDP messages to a closed web socket connection since it detaches from it and tries to attach to it concurrently.

This is for connecting to an existing browser over a WebSocket URL.

This will panic if the fix in this PR did not get applied. Since the panic occurs in a different routine, we can't catch the panic, and leave this test as naked (without using require.Panics).

Also move the log to top.

ankur22

👏 Thanks for splitting the changes into smaller commits, it was a lot easier to follow. I only have some very minor suggestions.

LGTM 🎉

common/browser.go

Also move one logging out of locking. Co-authored-by: ankur22 <ankur.agarwal@grafana.com>

ka3de

Great work @inancgumus ! 🎉
LGTM.

Rationale: I suggested this earlier on to detect nil sessions. But now, this warning is outdated because when users running multiple instance/VU tests, they will see dozens/hundreds lines of warnings. The core reason we receive a lot of these warnings is: We need to correctly handle the CDP messages while sending and receiving them (the order of them). #848

inancgumus force-pushed the fix/k6c1096-multi-vu-sessions branch from 0bd5c65 to 86b9217 Compare April 5, 2023 07:44

inancgumus added bug Something isn't working refactor remote remote browser related labels Apr 5, 2023

inancgumus self-assigned this Apr 5, 2023

inancgumus added this to the v0.9.0 milestone Apr 5, 2023

inancgumus marked this pull request as ready for review April 5, 2023 08:13

inancgumus requested review from ankur22 and ka3de April 5, 2023 08:13

inancgumus changed the title ~~Fix/k6c1096 multi vu sessions~~ Fix multiple k6 instances can connect to one browser instance and run tests concurrently Apr 5, 2023

ankur22 requested changes Apr 5, 2023

View reviewed changes

common/session.go Show resolved Hide resolved

common/browser.go Outdated Show resolved Hide resolved

common/browser.go Outdated Show resolved Hide resolved

common/browser.go Show resolved Hide resolved

inancgumus added 16 commits April 5, 2023 14:58

Rename evti to targetPage in onAttachedToTarget

2377503

Add session.Close

c348332

This makes it easy to check if a session is closed from a non-select statement, like a switch or if.

Add testBrowser.browserType field

1bef29e

This is for connecting to an existing browser over a WebSocket URL.

Add TestMultiConnectToSingleBrowser

47567fc

This will panic if the fix in this PR did not get applied. Since the panic occurs in a different routine, we can't catch the panic, and leave this test as naked (without using require.Panics).

Add Browser.getDefaultBrowserContextOrByID

1a167e7

Refactor to getDefaultBrowserContextOrByID

d649616

Simplify var declaration in onAttachedToTarget

ed2a3d2

Also move the log to top.

Add Browser.isAttachedPageValid

34a0be1

Refactor to isAttachedPageValid

3f43f23

Move isPage check onAttachedTarget

abe212d

Add Browser.isPageAttachmentErrorIgnorable

3a3c000

Refactor to isPageAttachmentErrorIgnorable

b5b5204

Add Browser.attachNewPage

5b2ae23

Refactor to attachNewPage

1b2a2fc

Refactor to simplify targetPage type attachment

406d2c6

inancgumus force-pushed the fix/k6c1096-multi-vu-sessions branch from 86b9217 to 406d2c6 Compare April 5, 2023 12:20

inancgumus requested a review from ankur22 April 5, 2023 12:24

inancgumus added the team/k6browser To distinguish the issue on project boards. label Apr 5, 2023

ankur22 approved these changes Apr 5, 2023

View reviewed changes

common/browser.go Outdated Show resolved Hide resolved

common/browser.go Outdated Show resolved Hide resolved

common/browser.go Outdated Show resolved Hide resolved

common/browser.go Show resolved Hide resolved

Fix log categories in page attachment

745cf54

Also move one logging out of locking. Co-authored-by: ankur22 <ankur.agarwal@grafana.com>

ka3de approved these changes Apr 6, 2023

View reviewed changes

inancgumus merged commit 49b79d4 into main Apr 6, 2023

inancgumus deleted the fix/k6c1096-multi-vu-sessions branch April 6, 2023 07:04

inancgumus changed the title ~~Fix multiple k6 instances can connect to one browser instance and run tests concurrently~~ Fix allow multiple k6 instances to connect to one browser to run concurrent tests Apr 6, 2023

inancgumus mentioned this pull request Apr 6, 2023

Update nil session log warn to debug in page attach #849

Merged

inancgumus changed the title ~~Fix allow multiple k6 instances to connect to one browser to run concurrent tests~~ Allow multiple k6 instances to connect to one browser to run concurrent tests Apr 6, 2023

inancgumus changed the title ~~Allow multiple k6 instances to connect to one browser to run concurrent tests~~ Enable multiple k6-browser instances to run concurrent tests against a single browser May 25, 2023

inancgumus mentioned this pull request Jun 13, 2023

Limit browser implementation to hold a single browserContext #929

Merged

inancgumus mentioned this pull request Nov 27, 2023

Run 1000+ VUs: Isolate browser contexts #1112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable multiple k6-browser instances to run concurrent tests against a single browser #848

Enable multiple k6-browser instances to run concurrent tests against a single browser #848

inancgumus commented Apr 5, 2023 •

edited

Loading

ankur22 left a comment

ankur22 left a comment

ka3de left a comment

Enable multiple k6-browser instances to run concurrent tests against a single browser #848

Enable multiple k6-browser instances to run concurrent tests against a single browser #848

Conversation

inancgumus commented Apr 5, 2023 • edited Loading

Problem

Fix

Testing

ankur22 left a comment

Choose a reason for hiding this comment

ankur22 left a comment

Choose a reason for hiding this comment

ka3de left a comment

Choose a reason for hiding this comment

inancgumus commented Apr 5, 2023 •

edited

Loading