-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable multiple k6-browser instances to run concurrent tests against a single browser #848
Conversation
0bd5c65
to
86b9217
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great bit of debugging to catch this and resolve it! Nice work 👏
I'm wondering if you could help me a bit by splitting the last commit into two, one for the new changes to onAttachedToTarget
and another for the splitting/refactoring the method into smaller methods?
This makes it easy to check if a session is closed from a non-select statement, like a switch or if.
When we run multi-VU/instance tests, one of the test runs can close a page of its own. Since each test run attaches to all the browser pages (this is another area we might want to discuss), this sometimes creates a race between the test runs. This check allows the test run to continue instead of panicking because of another test run's page detachments. This fix allows running the k6 browser in high concurrency with multiple VUs and instances. Note that it is still possible to get errors since we need to correctly handle the CDP messages while sending and receiving them (the order of them). This can cause timeout and other sorts of errors. Explanation: There are two instances and different sessions. However, both instances attach/detach from the same pages since we run in the same browser. instance1 ---> newPage page1 attaches to: page1 instance2 attaches to: page1 instance2 ---> newPage page1, page2 attaches to: page2 instance1 attaches to: page2 instance1 --> page1.Close() page2 instance1 <-- detachedFromTarget(page1) closes page1 session. This is the racy part that this PR fixes: instance2 <-- detachedFromTarget(page1) closes page1 session. instance2 ---> attachToTarget(page1) panic: session does not exist. instance2 panics while trying to send CDP messages to a closed web socket connection since it detaches from it and tries to attach to it concurrently.
This is for connecting to an existing browser over a WebSocket URL.
This will panic if the fix in this PR did not get applied. Since the panic occurs in a different routine, we can't catch the panic, and leave this test as naked (without using require.Panics).
Also move the log to top.
86b9217
to
406d2c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 Thanks for splitting the changes into smaller commits, it was a lot easier to follow. I only have some very minor suggestions.
LGTM 🎉
Also move one logging out of locking. Co-authored-by: ankur22 <ankur.agarwal@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @inancgumus ! 🎉
LGTM.
Rationale: I suggested this earlier on to detect nil sessions. But now, this warning is outdated because when users running multiple instance/VU tests, they will see dozens/hundreds lines of warnings. The core reason we receive a lot of these warnings is: We need to correctly handle the CDP messages while sending and receiving them (the order of them). #848
Rationale: I suggested this earlier on to detect nil sessions. But now, this warning is outdated because when users running multiple instance/VU tests, they will see dozens/hundreds lines of warnings. The core reason we receive a lot of these warnings is: We need to correctly handle the CDP messages while sending and receiving them (the order of them). #848
Rationale: I suggested this earlier on to detect nil sessions. But now, this warning is outdated because when users running multiple instance/VU tests, they will see dozens/hundreds lines of warnings. The core reason we receive a lot of these warnings is: We need to correctly handle the CDP messages while sending and receiving them (the order of them). #848
Problem
When we run multi-VU/instance tests, one of the test runs can close a page of its own. Since each test run attaches to all the browser pages (this is another area we might want to discuss), this sometimes creates a race between the test runs. Here is a step-by-step explanation.
There are two instances and different sessions. However, both instances attach/detach from the same pages since we run in the same browser.
instance2
panics while trying to send CDP messages to a closed web socket connection since it detaches from it (becauseinstance1
closes it) and tries to attach to it concurrently (race).🚨 ...note that this operation continues concurrently...
closes page1 session for instance1. these sessions are not shared with other instances. they are specific to this instance. however, the pages on the browser are shared between instances.
closes the page1 (targetID) session for instance2.
🚨panic: session does not exist.
There was a panic because instance2.attach to page 1 operation was going on before instance1 closed page1.
Fix
This check allows the test run to continue instead of panicking because of another test run's page detachments. The fix allows running the k6 browser in high concurrency with multiple VUs and instances. Note that it is still possible to get errors since we need to correctly handle the CDP messages while sending and receiving them (the order of them). This can cause timeout and other sorts of errors.
Testing
Here's the test script I used to test this:
Here's the bash script (
multik6b.sh
) that can run multiple tests (a courtesy of @ankur22 🙇):Here's an example command for testing:
The PR also refactors the page attachment logic for better maintenance and readability—also fixes the linter warnings. This helped me find the problem since it made it easier to understand the code. I didn't prefer to put it in another PR but rather as a commit here. I believe "make it better than you found it" is a nice approach for reducing technical debt :)