Intermittent testbed failure in CI on Linux #2648

freakboy3742 · 2024-06-13T23:23:01Z

Describe the bug

We have an intermittent build failure on the GTK testbed test, with a segfault from the PyGObject layer.

Steps to reproduce

Run the Linux testbed test. The failure isn't especially reproducible; re-running the test suite almost always passes. this is the most recent example.

Expected behavior

Test suite should pass without error.

Screenshots

No response

Environment

Operating System: Ubuntu 22.04
Python version: 3.10
Software versions:
- Toga: 0.4.5+

Logs

tests/widgets/test_selection.py::test_flex_horizontal_widget_size <- tests/widgets/properties.py PASSED [ 64%]
tests/widgets/test_selection.py::test_font <- tests/widgets/properties.py PASSED [ 64%]
tests/widgets/test_selection.py::test_font_attrs <- tests/widgets/properties.py PASSED [ 65%]
tests/widgets/test_selection.py::test_item_titles Fatal Python error: Aborted

Current thread 0x00007f2d05dfb640 (most recent call first):
  Garbage-collecting
  File "/usr/lib/python3.10/inspect.py", line 2969 in __init__
  File "/usr/lib/python3.10/inspect.py", line 2370 in _signature_from_function
  File "/usr/lib/python3.10/inspect.py", line 2463 in _signature_from_callable
  File "/usr/lib/python3.10/inspect.py", line 3002 in from_callable
  File "/usr/lib/python3.10/inspect.py", line 3254 in signature
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pytest_asyncio/plugin.py", line 240 in _add_kwargs
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pytest_asyncio/plugin.py", line 278 in _asyncgen_fixture_wrapper
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 907 in call_fixture_func
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 1128 in pytest_fixture_setup
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 1074 in execute
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 676 in _compute_fixture_value
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 590 in _get_active_fixturedef
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 568 in getfixturevalue
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 549 in _fillfixtures
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/python.py", line 1792 in setup
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 492 in setup
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 155 in pytest_runtest_setup
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 260 in <lambda>
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 339 in from_call
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 259 in call_runtest_hook
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 220 in call_and_report
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 125 in runtestprotocol
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 112 in pytest_runtest_protocol
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/main.py", line 349 in pytest_runtestloop
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/main.py", line 324 in _main
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/main.py", line 270 in wrap_session
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/config/__init__.py", line 167 in main
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app/tests/testbed.py", line 29 in run_tests
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f2d17eb5480 (most recent call first):
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/gi/overrides/Gio.py", line 42 in run
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/gbulb/glib_events.py", line 839 in run
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/gbulb/gtk.py", line 39 in run
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/gbulb/glib_events.py", line 886 in run_forever
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/toga_gtk/app.py", line 198 in main_loop
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/toga/app.py", line 632 in main_loop
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app/tests/testbed.py", line 163 in <module>
  File "/usr/lib/python3.10/runpy.py", line 86 in _run_code
  File "/usr/lib/python3.10/runpy.py", line 196 in _run_module_as_main

Extension modules: gi._gi, cairo._cairo, gi._gi_cairo, PIL._imaging, PIL._imagingft (total: 5)

Test suite didn't report a result.

Additional context

No response

The text was updated successfully, but these errors were encountered:

rmartin16 · 2024-06-14T19:47:22Z

I was finally able to get a stacktrace for this:

#10 0x00007ba90fcd93fd in WTFCrashWithInfo () at /usr/src/webkit2gtk-2.44.2-0ubuntu0.24.04.1/build-soup3/WTF/Headers/wtf/Assertions.h:780
#11 WebKit::WebProcessPool::pageEndUsingWebsiteDataStore () at /usr/src/webkit2gtk-2.44.2-0ubuntu0.24.04.1/Source/WebKit/UIProcess/WebProcessPool.cpp:1314
#12 0x00007ba9101fe050 in WebKit::WebProcessProxy::removeWebPage () at /usr/src/webkit2gtk-2.44.2-0ubuntu0.24.04.1/Source/WebKit/UIProcess/WebProcessProxy.cpp:843
#13 0x00007ba910200555 in WebKit::WebPageProxy::close () at /usr/src/webkit2gtk-2.44.2-0ubuntu0.24.04.1/Source/WebKit/UIProcess/WebPageProxy.cpp:1558
#14 0x00007ba9102dea7b in webkitWebViewBaseDispose () at /usr/src/webkit2gtk-2.44.2-0ubuntu0.24.04.1/Source/WebKit/UIProcess/API/gtk/WebKitWebViewBase.cpp:858
#15 0x00007ba94c2593fe in g_object_unref (_object=0x58ca47818e30) at ../../../gobject/gobject.c:4381

The first assert is apparently failing...

void WebProcessPool::pageEndUsingWebsiteDataStore(WebPageProxy& page, WebsiteDataStore& dataStore)
{
    RELEASE_ASSERT(RunLoop::isMain());
    auto sessionID = dataStore.sessionID();
    RELEASE_ASSERT(m_sessionToPageIDsMap.isValidKey(dataStore.sessionID()));
    auto iterator = m_sessionToPageIDsMap.find(sessionID);
    RELEASE_ASSERT(iterator != m_sessionToPageIDsMap.end());
    ...

https://github.com/WebKit/WebKit/blob/f736325e66bfa8e85f85387299448476f3e1fb3c/Source/WebKit/UIProcess/WebProcessPool.cpp#L1312-L1318

sooo....its not running on the main thread....but only gets upset sometimes....

rmartin16 · 2024-06-14T22:40:32Z

fwiw, finally got a stacktrace on ubuntu 22.04 as well to confirm they match:

#10 0x000070097faf421b in WTFCrashWithInfo(int, char const*, char const*, int) () at WTF/Headers/wtf/Assertions.h:780
#11 WebKit::WebProcessPool::pageEndUsingWebsiteDataStore(WebKit::WebPageProxy&, WebKit::WebsiteDataStore&) ()
    at ./Source/WebKit/UIProcess/WebProcessPool.cpp:1314
#12 0x000070098003d0ae in WebKit::WebProcessProxy::removeWebPage(WebKit::WebPageProxy&, WebKit::WebProcessProxy::EndsUsingDataStore) () at ./Source/WebKit/UIProcess/WebProcessProxy.cpp:843
#13 0x000070098003f68c in WebKit::WebPageProxy::close() () at ./Source/WebKit/UIProcess/WebPageProxy.cpp:1558
#14 0x0000700980120c95 in webkitWebViewBaseDispose() () at ./Source/WebKit/UIProcess/API/gtk/WebKitWebViewBase.cpp:858
#15 0x00007009cbcd9ed1 in g_object_unref (_object=<optimized out>) at ../../../gobject/gobject.c:3648

freakboy3742 · 2024-06-15T00:44:12Z

FWIW: There's an existing hack in the testbed's webkit tests to work around something very similar to this. The issue seems to be that Webkit starts its own threads, and the lifecycle of those threads interferes with pytest's process of creating and destroying widgets. It's likely only an issue because the testbed is rapidly creating and destroying widgets; I don't know if the existing workaround can be patched, or if we need to revisit this entirely.

freakboy3742 · 2024-06-15T00:44:55Z

Also - of interest: you're seeing the crash in Webkit (which we've at least seen before); but the CI failure was occurring in test_selection.

rmartin16 · 2024-06-15T00:45:56Z

Also - of interest: you're seeing the crash in Webkit (which we've at least seen before); but the CI failure was occurring in test_selection.

AFAICT, the precise point at which the crash occurs in indeterminate because it happens in the garbage collector.

rmartin16 · 2024-06-15T00:48:47Z

I've been playing around with this, though, and can more or less reliably invoke the crash.

The WebView is created in the thread for the testbed app....but if I call gc.collect often from the main thread, it'll trigger the assertion failure because the thread trying to dispose of the WebView is not the app thread. That's my theory, though....I've never thought much about how GC works in threads.

rmartin16 · 2024-06-15T00:53:59Z

Added a few print statements and a pytest autouse fixture to call gc.collect():

> briefcase dev -r --test -- -s -qqq -rP tests/widgets/test_webview.py 

[testbed] Running test suite in dev environment...
===========================================================================
App thread id: 0x7d7617b3fb80
Waiting for app to be ready for testing... ready.
Running gc in thread ID: 0x7d76092006c0
Created WebView in thread ID: 0x7d7617b3fb80
Running gc in thread ID: 0x7d76092006c0
Fatal Python error: Aborted

Current thread 0x00007d76092006c0 (most recent call first):
  Garbage-collecting
  File "/home/user/github/beeware/toga/testbed/tests/conftest.py", line 24 in gc_collect
  ...

Thread 0x00007d7617b3fb80 (most recent call first):
  File "/home/user/.pyenv/versions/briefcase-3.12/lib/python3.12/site-packages/gbulb/glib_events.py", line 175 in __callback__
  File "/home/user/.pyenv/versions/briefcase-3.12/lib/python3.12/site-packages/gi/overrides/Gio.py", line 42 in run
  ...

freakboy3742 · 2024-06-15T03:30:04Z

Another example of a crash; this time in splitcontainer. That indicates it's at least partially a Heisenbug; but also of interest is that the failure is before the WebView tests.

rmartin16 · 2024-06-15T03:30:56Z

Yeah; that's because garbage collecting MapView can cause the error as well.

rmartin16 · 2024-06-15T16:17:31Z

I had overlooked this apparently but the testbed tests already contain one mitigation for this crash introduced in 6c8877c:

toga/testbed/tests/widgets/test_webview.py

Lines 116 to 121 in 1060842

    
           if toga.platform.current_platform == "linux": 
        
               # On Gtk, ensure that the WebView is garbage collection before the next test 
        
               # case. This prevents a segfault at GC time likely coming from the test suite 
        
               # running in a thread and Gtk WebViews sharing resources between instances. 
        
               del widget 
        
               gc.collect()

The same exists in the MapView tests.

This must be decreasing the likelihood of the issue occurring....but the WebKit2 WebView must still be escaping garbage collection until later allowing the crash to still happen.

Another approach altogether here may be to instead keep the WebView referenced somewhere so garbage collection doesn't try to do anything with them....

[edit] I now see I was being directed to this earlier in the conversation :)

freakboy3742 added bug A crash or error in behavior. linux The issue relates Linux support. labels Jun 13, 2024

rmartin16 mentioned this issue Jun 14, 2024

Use latest dependencies for Testbed testing #2652

Merged

4 tasks

rmartin16 mentioned this issue Jun 15, 2024

CI linux testbed crash testing #2655

Closed

4 tasks

This was referenced Jun 15, 2024

Prevent garbage collection of WebView during testbed testing #2658

Merged

Gtk crash set_button_image_get_info_cb using file dialog #2659

Open

freakboy3742 closed this as completed in #2658 Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent testbed failure in CI on Linux #2648

Intermittent testbed failure in CI on Linux #2648

freakboy3742 commented Jun 13, 2024

rmartin16 commented Jun 14, 2024 •

edited

Loading

rmartin16 commented Jun 14, 2024

freakboy3742 commented Jun 15, 2024

freakboy3742 commented Jun 15, 2024

rmartin16 commented Jun 15, 2024

rmartin16 commented Jun 15, 2024 •

edited

Loading

rmartin16 commented Jun 15, 2024

freakboy3742 commented Jun 15, 2024

rmartin16 commented Jun 15, 2024

rmartin16 commented Jun 15, 2024 •

edited

Loading

Intermittent testbed failure in CI on Linux #2648

Intermittent testbed failure in CI on Linux #2648

Comments

freakboy3742 commented Jun 13, 2024

Describe the bug

Steps to reproduce

Expected behavior

Screenshots

Environment

Logs

Additional context

rmartin16 commented Jun 14, 2024 • edited Loading

rmartin16 commented Jun 14, 2024

freakboy3742 commented Jun 15, 2024

freakboy3742 commented Jun 15, 2024

rmartin16 commented Jun 15, 2024

rmartin16 commented Jun 15, 2024 • edited Loading

rmartin16 commented Jun 15, 2024

freakboy3742 commented Jun 15, 2024

rmartin16 commented Jun 15, 2024

rmartin16 commented Jun 15, 2024 • edited Loading

rmartin16 commented Jun 14, 2024 •

edited

Loading

rmartin16 commented Jun 15, 2024 •

edited

Loading

rmartin16 commented Jun 15, 2024 •

edited

Loading