Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Multi VU deadlock by isolating browser contexts #1219

Merged
merged 19 commits into from
Feb 22, 2024

Conversation

inancgumus
Copy link
Member

@inancgumus inancgumus commented Feb 20, 2024

What?

With this fix, the browser module can now work with an unlimited number of VUs and iterations (limited by the machine power and the browser, of course). As explained in #1112, this PR separates browser contexts to make each iteration focus solely on its pages rather than those created by other iterations. This brings:

  • Efficiency (at least 10X depending on the number of VUs)
  • Better debugging (less log noise)
  • Possibly fewer deadlocks (none seen so far in our tests)

Tested with the script in #971 and saw no deadlocks.

Note: This PR aims to apply this fix with minimum code changes to avoid disrupting the module's working.

Why?

Please take a look at #1112 and #971 for more details.

We're adding a separate onAttachedToTarget hook to act precisely even before the module's event system is used to carry on the onAttachedToTarget event. This way, we can precisely control Connection's attachment decisions at the moment that happens. This also prevents other related concurrency issues between Browser and Connection event management.

In an ideal world, the session management should be outside of Connection, and Browser would be managing sessions itself. Currently, this is the cleanest way (IMO) to do this.

The effects of this PR on other open issues

#971: Deadlock: Multi-VU many iterations

There are no longer deadlocks, even with 1000 VUs. See test run ID: 167467.

#966: Creating new page in browser context sometimes times out

Uncaught (in promise) GoError: creating new page in browser context: timed out after 30s
	at github.com/grafana/xk6-browser/browser.mapBrowserContext.func4 (native)
	at file:///tmp/TgCEDu/script.js:24:15(8)
 executor=shared-iterations scenario=ui

We no longer see these errors. See test run ID: 167467. However, there were plenty in test run ID 167336 (before this fix was applied).

#970: Uncaught (in promise) when navigating due to time out

Uncaught (in promise) navigating frame to "https://test.k6.io/": navigating to "https://test.k6.io/": timed out after 30s executor=shared-iterations scenario=ui

This still occurs. But this is about the server(s) under the test unable to keep up with the load. When I manually loaded the page on my browser, it gave me 503/502 many times. Here's the test run ID 167467's log:

2024-02-20 17:58:07.832 Failed to load resource: the server responded with a status of 502 (Bad Gateway) browser_source=network line_number=0 stacktrace=<nil> url=https://test.k6.io/ 
2024-02-20 17:58:07.835 Failed to load resource: the server responded with a status of 502 (Bad Gateway) browser_source=network line_number=0 stacktrace=<nil> url=https://test.k6.io/
2024-02-20 17:58:07.841 Failed to load resource: the server responded with a status of 502 (Bad Gateway) browser_source=network line_number=0 stacktrace=<nil> url=https://test.k6.io/static/css/site.css
2024-02-20 17:58:07.853 Failed to load resource: the server responded with a status of 502 (Bad Gateway) browser_source=network line_number=0 stacktrace=<nil> url=https://test.k6.io/static/js/prisms.js
2024-02-20 17:58:07.856 Failed to load resource: the server responded with a status of 502 (Bad Gateway) browser_source=network line_number=0 stacktrace=<nil> url=https://test.k6.io/static/css/site.css
2024-02-20 17:58:39.547 Uncaught (in promise) clicking on "a[href=\"/my_messages.php\"]": timed out after 30s executor=per-vu-iterations scenario=browser
2024-02-20 17:58:39.600 Uncaught (in promise) clicking on "a[href=\"/my_messages.php\"]": timed out after 30s executor=per-vu-iterations scenario=browser
2024-02-20 17:58:40.597 Uncaught (in promise) navigating frame to "https://test.k6.io/": navigating to "https://test.k6.io/": timed out after 30s executor=per-vu-iterations scenario=browser

Failing here looks normal to me.

Checklist

  • I have performed a self-review of my code
  • I have added tests for my changes.
  • I have commented on my code, particularly in hard-to-understand areas

Related PR(s)/Issue(s)

Updates #1112, #971, and #966 (even maybe #970).

@inancgumus inancgumus added bug Something isn't working optimization remote remote browser related labels Feb 20, 2024
@inancgumus inancgumus self-assigned this Feb 20, 2024
@inancgumus inancgumus force-pushed the fix/multi-vu-deadlock branch 2 times, most recently from 87aa32f to 518806c Compare February 20, 2024 16:25
@inancgumus inancgumus marked this pull request as ready for review February 20, 2024 17:05
@inancgumus inancgumus marked this pull request as draft February 21, 2024 07:09
@inancgumus inancgumus force-pushed the fix/multi-vu-deadlock branch 3 times, most recently from d2ec0ff to 570bd88 Compare February 21, 2024 08:12
@inancgumus inancgumus added browser context Issues and PRs related to browser context internal internal improvements and features performance labels Feb 21, 2024
@inancgumus inancgumus marked this pull request as ready for review February 21, 2024 08:29
@inancgumus inancgumus requested a review from ankur22 February 21, 2024 08:29
Copy link
Member Author

@inancgumus inancgumus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review.

common/browser.go Outdated Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser_context.go Show resolved Hide resolved
common/connection.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@ankur22 ankur22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What an amazing outcome from this PR! 🎉 🥇

Since it touches some areas that are heavily asynchronous and quite complex I've asked some follow up questions to help me understand some of the changes and how they help improve the runtime stability.

common/connection.go Show resolved Hide resolved
common/browser.go Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser.go Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/connection.go Show resolved Hide resolved
tests/browser_test.go Show resolved Hide resolved
common/browser_context.go Show resolved Hide resolved
@inancgumus
Copy link
Member Author

inancgumus commented Feb 22, 2024

What an amazing outcome from this PR! 🎉 🥇

Thank you for the enthusiastic feedback! 🙏 ❤️ As detailed in the discussion, the results of this PR were intentionally planned and are a direct outcome of those design choices. Seeing the team's efforts pushing our project forward is great. 😊 🥳

@inancgumus inancgumus force-pushed the fix/multi-vu-deadlock branch from 365d98e to 01a1d06 Compare February 22, 2024 07:02
common/browser.go Outdated Show resolved Hide resolved
@inancgumus inancgumus requested a review from ankur22 February 22, 2024 10:04
ankur22
ankur22 previously approved these changes Feb 22, 2024
Copy link
Collaborator

@ankur22 ankur22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 🌕

This allows us to detect whether a session that we're trying to close
was in the connection's sessions. This way, we can skip. See the next
commit for details.
The connection now skips doing session read and browser module event
generation and management. This way, we prevent unnecessary
communication overhead.
This callback will be called to manage the connection session handling
from outside. Since Browser is the creator of Connection, we're adding
this callback to let Browser hook into target attachment and tell
Connection to stop attaching the target if Browser doesn't want it.

This is a downside of the current design. There is a symbiotic
relationship between Connection and Browser. Perhaps they should be
together, not separate this way. But for the current design, this is the
best we can do to improve Browser's session management.
This way, we can listen target attachment events from Connection, and
veto Connection's decision of adding a Session to its internal list.
This isn't yet functional until we add other mechanics in later commits.
For now, it allows Connection to attach a target if it's in the same
BrowserContext (or in the default context).

We don't use `defer` for the lock because we don't want introduce any
lock overhead. Local tests showed us that holding the lock for the
entire function is prone to unexpected deadlock issues. This is actually
for a future code addition that can easily miss this fact.
This and the previous commit allow us to stop sharing browser contexts.
This method tells the browser to stop waiting for the attached target.
Otherwise, the browser would indefinitely wait for the target that we
don't want to be attached. This way, we prevent resource overuse.
When Browser doesn't want a target to be attached, Connection tells the
browser to stop waiting for debugger. This prevents the browser from
indefinitely waiting for a target (that we don't want) to be attached.
This will be useful to register the onTargetAttachedToTarget event. See
the next commit for the reason.
NewConnection may start recv/send loop before the
onTargetAttachedToTarget callback is attached. Extracting start()
goroutine loops from NewConnection and calling it explicitly after
registering the callback prevents this problem.
This method was returning +1 pages. This fix is necessary for the
test in the following commit to work.
This test ensures that we will no longer share pages across browser
contexts.
- Removes start()
- Moves onTargetAttachedToTarget to the constructor.
- Moves implicit goroutine run to the constructor.
- Updates tests to work with a nil hook.
@inancgumus inancgumus force-pushed the fix/multi-vu-deadlock branch from c8416b3 to a6a8cb9 Compare February 22, 2024 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
browser context Issues and PRs related to browser context bug Something isn't working internal internal improvements and features optimization performance remote remote browser related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants