Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic (NPD): browser.mapPage -> Page.GetKeyboard #4128

Closed
yevk opened this issue Dec 17, 2024 · 18 comments · Fixed by #4132
Closed

Panic (NPD): browser.mapPage -> Page.GetKeyboard #4128

yevk opened this issue Dec 17, 2024 · 18 comments · Fixed by #4132

Comments

@yevk
Copy link

yevk commented Dec 17, 2024

Brief summary

k6 panics during a long running test with many VUs, each VU creating multiple pages.

Each VU creates a single (non-default) browser context, which is then used to create all pages for that VU.

k6 version in the stack trace is a patch from grafana/xk6-browser#1502. Same issue happens with GA 0.55.0.

Chrome version: Google Chrome 131.0.6778.139 (stable)

panic: runtime error: invalid memory address or nil pointer dereference

[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x13b838e]

goroutine 2448570

go.k6.io/k6/js/modules/k6/browser/common.(*Page).GetKeyboard(...)
        /go/pkg/mod/go.k6.io/k6@v0.55.1-0.20241213083140-745f0d48d305/js/modules/k6/browser/common/page.go:1025
go.k6.io/k6/js/modules/k6/browser/browser.mapPage({{0x1f48e70, 0xc0032b8340}, 0xc0006134d0, 0xc003c1f200, 0xc0037f5600, {0x1f32ae0, 0x2f4c040}, {0x0, 0x0}}, 0x0)
        /go/pkg/mod/go.k6.io/k6@v0.55.1-0.20241213083140-745f0d48d305/js/modules/k6/browser/browser/page_mapping.go:203 +0x1c6e
go.k6.io/k6/js/modules/k6/browser/browser.mapBrowserContext.func14.1()
        /go/pkg/mod/go.k6.io/k6@v0.55.1-0.20241213083140-745f0d48d305/js/modules/k6/browser/browser/browser_context_mapping.go:183 +0x6a
go.k6.io/k6/js/modules/k6/browser/k6ext.promise.func1()
        /go/pkg/mod/go.k6.io/k6@v0.55.1-0.20241213083140-745f0d48d305/js/modules/k6/browser/k6ext/promise.go:24 +0x2c
created by go.k6.io/k6/js/modules/k6/browser/k6ext.promise in goroutine 844
        /go/pkg/mod/go.k6.io/k6@v0.55.1-0.20241213083140-745f0d48d305/js/modules/k6/browser/k6ext/promise.go:23 +0x9a

I couldn't reproduce the panic using a simpler script and I can't share the internal scripts I use.

The machine running the test has many CPUs and a lot of RAM (both are not fully consumed during tests). I also optimized the OS using the K6 optimization guide.

k6 version

0.55.0

OS

Ubuntu 22.04

Docker version and image (if applicable)

No response

Steps to reproduce the problem

I tried reducing the number of VUs (from 150 to 70) and it did help, around 70 VUs did not reproduce the panic. Although I would consider this a weak signal.

I traced the code where the panic is coming from.

The panic is at:

go.k6.io/k6/js/modules/k6/browser/common.(*Page).GetKeyboard(...)
        /go/pkg/mod/go.k6.io/k6@v0.55.1-0.20241213083140-745f0d48d305/js/modules/k6/browser/common/page.go:1025

But I think the problem begins earlier, in:

go.k6.io/k6/js/modules/k6/browser/browser.mapPage({{0x1f48e70, 0xc0032b8340}, 0xc0006134d0, 0xc003c1f200, 0xc0037f5600, {0x1f32ae0, 0x2f4c040}, {0x0, 0x0}}, 0x0)
        /go/pkg/mod/go.k6.io/k6@v0.55.1-0.20241213083140-745f0d48d305/js/modules/k6/browser/browser/page_mapping.go:203 +0x1c6e

In mapPage, most of the values defined in mapping are functions, except for keyboard, mouse and a few others, which access p while mapping is being constructed.

It panics when "kyboard" is evaluated. p.GetKeyboard() is called while p == nil. Looks like nothing else in mapPage is accessing p prior to "keyboard".

Further down the stack:

go.k6.io/k6/js/modules/k6/browser/browser.mapBrowserContext.func14.1()
        /go/pkg/mod/go.k6.io/k6@v0.55.1-0.20241213083140-745f0d48d305/js/modules/k6/browser/browser/browser_context_mapping.go:183 +0x6a

The call to mapPage is coming from mapBrowserContext, from within the promise for newPage. This is where I lost track of what leads to page being nil.


Here are some of the last log entries from k6 --verbose (the last log is always cut off at different length on each run):

time="2024-12-17T14:01:47Z" level=debug msg="sid:68B7485AA3075F75A5F32B41D6393A2A tid:AC5FF2FFEEE5E5A3B42104F1DB105545 method:\"Runtime.callFunctionOn\"" category="Session:Execute" elapsed="12 ms" iteration_id=513fecb4babdb46b source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:BA5DE1F0D513516B5A7ADA5702DB6BD8 tid:46290C073BB8FBDB809EEC708260162D method:\"Page.getLayoutMetrics\"" category="Session:Execute:s.conn.send" elapsed="0 ms" iteration_id=a13a9b4a64ac4298 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="fmid:2583 fid:61A07308EA5B2E61830C95076C8954CD" category="FrameManager:frameLoadingStarted" elapsed="0 ms" iteration_id=b2ebcc354aba5551 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="fid:801BE2E2A66CEBB01B3AB8386CA57DAB furl:\"about:blank\" ectxid:1 world:main, world set" category="Frame:setContext" elapsed="0 ms" iteration_id=d25c28ecdbbb7d30 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:794040631269D8FD337237324B73A6BB stid:B3B96DEEE3C3571A41832FAB30963E88 fid:B3B96DEEE3C3571A41832FAB30963E88 ectxid:3 efurl:<redacted>" category="ExecutionContext:getInjectedScript" elapsed="1 ms" iteration_id=1c2dd5ee3d8c8cb5 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:C9D4DD6864A5B0BD5BB67C53B27A5F3A tid:55AD2E057F57B8E9B4210898B07A3B8D method:\"Runtime.callFunctionOn\"" category="Session:Execute:s.conn.send" elapsed="2 ms" iteration_id=250201deb861942f source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:B86D0963E08C565169498787019DCE4D tid:BD04B061E136E26600CDF4DBB10C7C78 method:\"Runtime.callFunctionOn\"" category="Session:Execute:s.conn.send" elapsed="1 ms" iteration_id=d25c28ecdbbb7d30 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:68B7485AA3075F75A5F32B41D6393A2A tid:AC5FF2FFEEE5E5A3B42104F1DB105545 method:\"Runtime.callFunctionOn\"" category="Session:Execute:s.conn.send" elapsed="3 ms" iteration_id=513fecb4babdb46b source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:23EC2B734B43D53AC6480F1166767BE3 tid:801BE2E2A66CEBB01B3AB8386CA57DAB method:\"Target.setAutoAttach\"" category="Session:Execute:s.conn.send" elapsed="0 ms" iteration_id=d25c28ecdbbb7d30 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="fmid:2197 rurl:<redacted> pdoc:nil" category="FrameManager:requestStarted" elapsed="1 ms" iteration_id=250201deb861942f source=browser
time="2024-12-17T14:01:47Z" level=debug msg="fid:9EB328303BFAC9D8900F2FE191603843 furl:\"<redacted>\" sel:\"div[data-id=marker-1]\" opts:[object Object]" category="Locator:WaitFor" elapsed="3 ms" iteration_id=660962042e75cc53 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:794040631269D8FD337237324B73A6BB stid:B3B96DEEE3C3571A41832FAB30963E88 fid:B3B96DEEE3C3571A41832FAB30963E88 ectxid:3 furl:\"<redacted>\" forceCallable:true returnByValue:true" category="ExecutionContext:eval" elapsed="1 ms" iteration_id=1c2dd5ee3d8c8cb5 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:794040631269D8FD337237324B73A6BB tid:B3B96DEEE3C3571A41832FAB30963E88 method:\"Runtime.callFunctionOn\"" category="Session:Execute" elapsed="0 ms" iteration_id=1c2dd5ee3d8c8cb5 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:4D7455D4B0395EB5DDD8A777C2A277B6 tid:B18AD3BE14936FEDBA7ED582E9BC6D88 pageType:page" category="Browser:attachNewPage:addTarget" elapsed="1 ms" iteration_id=39d0b74beaf3992 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:4D7455D4B0395EB5DDD8A777C2A277B6 tid:B18AD3BE14936FEDBA7ED582E9BC6D88 pageType:page" category="Browser:attachNewPage:addSession" elapsed="0 ms" iteration_id=39d0b74beaf3992 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:794040631269D8FD337237324B73A6BB tid:B3B96DEEE3C3571A41832FAB30963E88 method:\"Runtime.callFunctionOn\"" category="Session:Execute:s.conn.send" elapsed="0 ms" iteration_id=1c2dd5ee3d8c8cb5 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:D3D7831871D0FAE7438AB88ECD838C22 tid:AE86570BABAA8FFE41D3CF41DEF12E19 fid:AE86570BABAA8FFE41D3CF41DEF12E19" category="FrameSession:onPageNavigatedWithinDocument" elapsed="2 ms" iteration_id=efc99fe336f37a8e source=browser
time="2024-12-17T14:01:47Z" level=debug msg="fmid:2635 rurl:<redacted> pdoc:nil" category="FrameManager:requestStarted" elapsed="18 ms" iteration_id=f6377669c7647800 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="fmid:2577 fid:AE86570BABAA8FFE41D3CF41DEF12E19 url:<redacted>" category="FrameManager:frameNavigatedWithinDocument" elapsed="0 ms" iteration_id=efc99fe336f37a8e source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:44793C667E8B04A25562BA3C18CB9659 tid:210C1F424E4049D09A446E88E3B42EAB fid:210C1F424E4049D09A446E88E3B42EAB" category="FrameSession:onPageNavigatedWithinDocument" elapsed="0 ms" iteration_id=50655300dc67189 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:3CA4DB727AF2CC01AB43A7CEEB86C7DF stid:71BB872E247094688D072CBD428E0E54 fid:71BB872E247094688D072CBD428E0E54 ectxid:3 furl:\"<redacted>\" forceCallable:true returnByValue:false" category="ExecutionContext:eval" elapsed="0 ms" iteration_id=fdb9afc39e2a2b95 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:D4AB1B4A696641F0ADA3FE3EB7E0E610 tid:278C20E93ED9B7F332D1886CD33E94A2 fid:278C20E93ED9B7F332D1886CD33E94A2" category="FrameSession:onFrameStoppedLoading" elapsed="1 ms" iteration_id=29070fe0c8d2e815 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:946596F5ACF8C549FA790E3B83CFB466" category="Browser:initEvents:onDetachedFromTarget" elapsed="0 ms" iteration_id=67014c60fc38df11 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:3CA4DB727AF2CC01AB43A7CEEB86C7DF tid:71BB872E247094688D072CBD428E0E54 method:\"Runtime.callFunctionOn\"" category="Session:Execute" elapsed="6 ms" iteration_id=fdb9afc39e2a2b95 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="fid:BB628E562BC1F84076672856480918D0 furl:\"<redacted>\" rid:173688.3" category="Frame:deleteRequest" elapsed="0 ms" iteration_id=f546e2ee3b6f7d46 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:FB2C181E1A0E449250D59C81C18B40DA stid:B1C602CA54A6B9A8E43F0997A689D500 fid:B1C602CA54A6B9A8E43F0997A689D500 ectxid:3 efurl:<redacted>" category="ExecutionContext:getInjectedScript" elapsed="1 ms" iteration_id=23db4172f2a56cf7 source=browser
time="2024-12-17T14:01:47Z" level=debug msg="fmid:2573 fid:9BE97571B314EB5810FD025B85C61D33" category="FrameManager:frameLoadingStopped" elapsed="0 ms" iteration_id=abe61a71a51ffdad source=browser
time="2024-12-17T14:01:47Z" level=debug msg="sid:6E7D66EC89AA8D5682500A527BC43922 tid:B1D6933E2976DFA47623A3FE2EDE166C esid:79B982FBD7A4D6163258969E1ACA1B5E etid:697B3F9A248C0B2E7743F7

Expected behaviour

No panic - worst case should abort with error.

Actual behaviour

Panic

@olegbespalov
Copy link
Contributor

Hi @yevk !

Thanks for the report! Out of the curiosity, does the issue persist in k6 v0.53, like the same way as you found that screenshoter started panicing in k6 v0.54?

@yevk
Copy link
Author

yevk commented Dec 17, 2024

Hey @olegbespalov, I'll test it tomorrow and get back to you. Should I also test 0.54?

@olegbespalov
Copy link
Contributor

olegbespalov commented Dec 17, 2024

@yevk thanks!

Should I also test 0.54?

If possible, yes, please.

Just for more context, since we have no minimal script (yet), I'm wondering to prove (and maybe find out) if perhaps some change between v0.53...v0.54 causes these nil ponter panics.

cc: @ankur22

@yevk
Copy link
Author

yevk commented Dec 18, 2024

@olegbespalov, same panic with 0.53:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x156fcce]

goroutine 4524824

github.com/grafana/xk6-browser/common.(*Page).GetKeyboard(...)
        github.com/grafana/xk6-browser@v1.7.0/common/page.go:834
github.com/grafana/xk6-browser/browser.mapPage({{0x1edf570, 0xc003b91cc0}, 0xc0004b1ef0, 0xc0045a5260, 0xc0045d0960, {0x1ec7ee0, 0x2ed59c0}, {0x0, 0x0}}, 0x0)
        github.com/grafana/xk6-browser@v1.7.0/browser/page_mapping.go:193 +0x1cae
github.com/grafana/xk6-browser/browser.mapBrowserContext.func14.1()
        github.com/grafana/xk6-browser@v1.7.0/browser/browser_context_mapping.go:169 +0x6a
github.com/grafana/xk6-browser/k6ext.promise.func1()
        github.com/grafana/xk6-browser@v1.7.0/k6ext/promise.go:41 +0x5a
created by github.com/grafana/xk6-browser/k6ext.promise in goroutine 1072
        github.com/grafana/xk6-browser@v1.7.0/k6ext/promise.go:40 +0xe6

@olegbespalov
Copy link
Contributor

@yevk I see 😢 thanks for checking 🙇 we will investigate and fix it then!

@olegbespalov olegbespalov removed their assignment Dec 18, 2024
@yevk
Copy link
Author

yevk commented Dec 18, 2024

@olegbespalov, I also tested with 0.54, same panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x15bd26e]

goroutine 6720274

github.com/grafana/xk6-browser/common.(*Page).GetKeyboard(...)
        github.com/grafana/xk6-browser@v1.8.5/common/page.go:846
github.com/grafana/xk6-browser/browser.mapPage({{0x1f27070, 0xc0037b3840}, 0xc00025df50, 0xc00454f680, 0xc0048bd8c0, {0x1f0eee0, 0x2f19480}, {0x0, 0x0}}, 0x0)
        github.com/grafana/xk6-browser@v1.8.5/browser/page_mapping.go:193 +0x1cae
github.com/grafana/xk6-browser/browser.mapBrowserContext.func14.1()
        github.com/grafana/xk6-browser@v1.8.5/browser/browser_context_mapping.go:169 +0x6a
github.com/grafana/xk6-browser/k6ext.promise.func1()
        github.com/grafana/xk6-browser@v1.8.5/k6ext/promise.go:41 +0x5a
created by github.com/grafana/xk6-browser/k6ext.promise in goroutine 1091
        github.com/grafana/xk6-browser@v1.8.5/k6ext/promise.go:40 +0xe6

@yevk
Copy link
Author

yevk commented Dec 18, 2024

@olegbespalov, please let me know if you want me to test anything or provide more information.

I'm currently blocked by this, trying to find a temporary workaround.

@ankur22
Copy link
Contributor

ankur22 commented Dec 18, 2024

Thanks for the quick feedback @yevk. I'll take a look to see how much effort it is to fix the issue.

@ankur22
Copy link
Contributor

ankur22 commented Dec 18, 2024

@yevk could you share your test script?

Initial Investigation

It looks like mapPage is called with a nil page object:

go.k6.io/k6/js/modules/k6/browser/browser.mapPage({{0x1f48e70, 0xc0032b8340}, 0xc0006134d0, 0xc003c1f200, 0xc0037f5600, {0x1f32ae0, 0x2f4c040}, {0x0, 0x0}}, 0x0)
        /go/pkg/mod/go.k6.io/k6@v0.55.1-0.20241213083140-745f0d48d305/js/modules/k6/browser/browser/page_mapping.go:203 +0x1c6e

mapPage takes two parameters, where the second parameter is page. In the stack trace the last parameter is 0x0.

I took a look at the code, and there's a chance that we're getting a nil page. The only way i can see this happening is with the following scenario:

  1. The newPage API is called;
  2. CDP request is sent to chrome to create the page;
  3. The browser module receives target.EventAttachedToTarget and it stores the page in memory;
  4. Usually we would be able to retrieve the page from memory and return it to the caller.
  5. However, what i think is happening is that we're getting a target.EventDetachedFromTarget, which deletes the page from memory;
  6. Now the browser ends up reading nil page from the map.

I can only see this race condition occurring when the load generator (where k6 is running) and or chrome instance(s) are under heavy load, which feels like it loosely correlates the issue description. Why we're getting these CDP events is kind of a mystery. It could be due to timeouts occurring and the iteration failing, which would close any pages which are opened, and the reason for the timeouts is again due to the heavy load on the load generator and or chrome instance(s).

@yevk could you check the resources of the load generator and or chrome instance(s) when you see this failure?

@yevk
Copy link
Author

yevk commented Dec 18, 2024

@ankur22, unfortunately I can't share the script. I'm trying to create a simple, clean script to reproduce and share with you. I haven't been able to achieve this yet, I will update if I can.

About checking the resources load, could you provide me with more specific directions for what you'd like me to collect?

@ankur22
Copy link
Contributor

ankur22 commented Dec 18, 2024

About checking the resources load, could you provide me with more specific directions for what you'd like me to collect?

@yevk, if you could get graphs of the CPU, RAM, network and disk usage for the machine(s) running k6 and chrome while the test is running, that would help identify whether the compute resource is the issue that could be causing this test ending/race condition.

Here's some links that might help you:

However, i just reread the original issue description and just spotted this:

"The machine running the test has many CPUs and a lot of RAM (both are not fully consumed during tests). I also optimized the OS using the K6 optimization guide."

A simple script that replicates this would be useful 🤞 🙇

@olegbespalov
Copy link
Contributor

@yevk after the checking the code, we got an assumption that the possible thing what happen is that we're reaching a default timeout for fetching the page, which is I'm not mistaken is just 30 secs.

So we adjusted the code and it should explicitly error #4132

Could you please verify if the version from that branch (you could install it by go install go.k6.io/k6@707cd2d34aa1b3c7d824aee9cbf6ace4a9960600).

If it starts errors about the timeout, that means that we were right and the way to go is just increase the timeout by passing an environment variable like K6_BROWSER_TIMEOUT with something like '1m' for k6.

Please let us know how it goes and if it helps!

@olegbespalov
Copy link
Contributor

@yevk just FYI, I've updated the original branch so the latest k6's install instruction is go install go.k6.io/k6@92039a175616b581f2bfa5dc032f35b47d5b38a3

@yevk
Copy link
Author

yevk commented Dec 20, 2024

@olegbespalov thanks for the update!

I ran a few tests, so far no panics. I want to do a few more in different variations, then I'll update you with the results.

I don't feel well right now, so I'll probably do it on Sunday.

@olegbespalov
Copy link
Contributor

olegbespalov commented Dec 20, 2024

@yevk thanks for the update and no rush with checking! 🙇

P.S. get well soon!

P.P.S. we'll be probably curious to learn what helped, just fix or fix + configure timeout

@yevk
Copy link
Author

yevk commented Dec 23, 2024

Hey @olegbespalov, I have results for the original fix version.

Test Version Settings Result
control 0.55.0 panic - original
1 0.55.0+fix panic - different
2 0.55.0+fix K6_BROWSER_TIMEOUT=1m panic - different
3 0.55.0 K6_BROWSER_TIMEOUT=1m panic - original

With the fix, the original panic is gone, even without setting the timeout.
The timeout alone does not solve the panic.

The above is using your original fix version. Let me know if you want to me to repeat with the updated fix.


With the fix, now there's a different panic (mostly consistent). I think it's unrelated to the fix as it also happened with the official 0.55.0 version + timeout:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1579376]

goroutine 5847450

github.com/grafana/xk6-browser/common.(*ElementHandle).clickablePoint(0xc02e0306e0)
        github.com/grafana/xk6-browser@v1.9.1/common/element_handle.go:168 +0x2b6
github.com/grafana/xk6-browser/common.(*ElementHandle).Click.(*ElementHandle).newPointerAction.func2({0x1f99918, 0xc035b18460}, 0x0?)
        github.com/grafana/xk6-browser@v1.9.1/common/element_handle.go:1619 +0x377
github.com/grafana/xk6-browser/common.retryPointerAction({0x1f99918, 0xc035b18460}, 0xc025301b20, 0xc01a1272c0)
        github.com/grafana/xk6-browser@v1.9.1/common/element_handle.go:1672 +0x3c
github.com/grafana/xk6-browser/common.(*ElementHandle).Click.(*ElementHandle).newPointerAction.func3({0x1f99918, 0xc035b18460}, 0xc03ec83730, 0xc03ec83810)
        github.com/grafana/xk6-browser@v1.9.1/common/element_handle.go:1654 +0x59
created by github.com/grafana/xk6-browser/common.call in goroutine 5847434
        github.com/grafana/xk6-browser@v1.9.1/common/helpers.go:103 +0x14c

@olegbespalov
Copy link
Contributor

@yevk Thanks for checking, then we'll merge the fix #4132

Yep, it sounds unrelated, we have an open issue #4123 with exactly same stack trace (keeping in mind moving browser into k6 core)

@yevk
Copy link
Author

yevk commented Dec 23, 2024

@olegbespalov, thanks for the quick response and fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants