Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with operator() symbol during long Hulu / YouTube video playbacks #696

Closed
varumugam123 opened this issue Feb 4, 2021 · 6 comments
Closed
Assignees

Comments

@varumugam123
Copy link

varumugam123 commented Feb 4, 2021

There are reports of crashes with below stack trace during long playbacks from Hulu & YouTube.

Crash thread #0 
Frame	Signature	Module	Source
#0	operator()	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/SlotVisitor.cpp:384 (0x0)
#1	drain	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/GCSegmentedArrayInlines.h:215 (0x1c)
#2	drainFromShared	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/SlotVisitor.cpp:671 (0x8)
#3	runFixpointPhase	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/Heap.cpp:1360 (0x24)
#4	runCurrentPhase	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/Heap.cpp:1175 (0x4)
#5	implFunction	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/Heap.cpp:1791 (0xc)
#6	callWithCurrentThreadState	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/build/DerivedSources/ForwardingHeaders/wtf/ScopedLambda.h:59 (0x0)
#7	collectInMutatorThread	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/Heap.cpp:1803 (0x0)
#8	stopIfNecessarySlow	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/Heap.cpp:1772 (0x0)
#9	stopIfNecessarySlow	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/Heap.cpp:1746 (0x0)
#10	collectIfNecessaryOrDefer	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/HeapInlines.h:266 (0x0)
#11	allocateSlowCase	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/LocalAllocator.cpp:124 (0x4)
#12	allocateCell<JSC::JSLexicalEnvironment>	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/heap/LocalAllocatorInlines.h:37 (0xc)
#13	create	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/runtime/JSLexicalEnvironment.h:96 (0x8)
#14	slow_path_create_lexical_environment	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/runtime/JSLexicalEnvironment.h:105 (0x10)
#15	libWPEWebKit-0.1.so.2.2.1@0x22fcac4	libWPEWebKit-0.1.so.2.2.1	
#16	llint_slow_path_call	libWPEWebKit-0.1.so.2.2.1	/usr/src/debug/wpe-webkit/2.22+gitAUTOINC+686cd2f7df-r0/git/Source/JavaScriptCore/llint/LLIntSlowPaths.cpp:1543 (0x18)

This was reported with Arris XG1v1 & Pace XG1v3 both are MIPS devices. I will let you know if there are any instances reported on ARM devices.

WPE version : 2.22 (686cd2f)

As of now, we don't have any simplified reproduction step, but will try and let you know if we find any.

@Gavriliuk
Copy link

Callstack points to "git/Source/JavaScriptCore/heap/SlotVisitor.cpp : 384"
There is a known GC issue marked as FIXME on the line 384:

ALWAYS_INLINE void SlotVisitor::visitChildren(const JSCell* cell)
{
    ASSERT(Heap::isMarked(cell));
...
    switch (cell->type()) {
    case StringType:
        JSString::visitChildren(const_cast<JSCell*>(cell), *this);
        break;

    case FinalObjectType:
        JSFinalObject::visitChildren(const_cast<JSCell*>(cell), *this);
        break;

    case ArrayType:
        JSArray::visitChildren(const_cast<JSCell*>(cell), *this);
        break;

    default:
        // FIXME: This could be so much better.
        // https://bugs.webkit.org/show_bug.cgi?id=162462
/// Line 384:
        cell->methodTable(vm())->visitChildren(const_cast<JSCell*>(cell), *this);
        break;
    }

Thus the bug https://bugs.webkit.org/show_bug.cgi?id=162462 (opened 2016-09-22) describes the same issue

@guijemont
Copy link

Thus the bug https://bugs.webkit.org/show_bug.cgi?id=162462 (opened 2016-09-22) describes the same issue

This bug, and the FIXME comment seem to refer to the potential for an optimization, I doubt that would be related to this crash.

@guijemont
Copy link

@varumugam123 What kind of crash is it? A segmentation fault, or something else? Assuming it's a segmentation fault, seeing where it's happening (when the GC visits the heap), it's highly likely that it is related to a memory corruption that happens much earlier. It might not be possible to get to any conclusion without a way to reproduce.
Though failing that, a full backtrace (thr apply all bt full in gdb) and/or a core file with all the build files (preferably in debug, but alternatively at least with symbol files) may give us some hints, and/or tell us which memory access is failing (assuming a segfault), as there are several on that line.

@Gavriliuk
Copy link

Hi @guijemont
Yes, we talk about SIGSEGV crashes
And it seems we don't have steps to reproduce the issue because all we have are crashreports
Callstacks may differ but there is a function drainFromShared always presenting in callstack, can this help?
We understand that the root cause is a memory corruption
The referenced bug https://bugs.webkit.org/show_bug.cgi?id=162462 gives us a hope that this is a known issue
Can it be fixed with a check for the cell type (in switch operator) to at least prevent invalid values from default processing?
Thanks

guijemont added a commit that referenced this issue Feb 26, 2021
@guijemont
Copy link

Hi @guijemont
Yes, we talk about SIGSEGV crashes
And it seems we don't have steps to reproduce the issue because all we have are crashreports
Callstacks may differ but there is a function drainFromShared always presenting in callstack, can this help?
We understand that the root cause is a memory corruption
The referenced bug https://bugs.webkit.org/show_bug.cgi?id=162462 gives us a hope that this is a known issue
Can it be fixed with a check for the cell type (in switch operator) to at least prevent invalid values from default processing?

I attempted a workaround like that, also checking that some values are non-null. It's here. I quickly tested it on arm with stress tests and it doesn't bring regressions.
It might make garbage collection marginally slower, so you might not want to deploy that to all users without testing for that first.
And most importantly: the log messages I added could help diagnose the issue a little more, but the chances of this patch preventing the crash altogether are pretty slim since:

  • we only test for a few reasons that could trigger the segfault, but there are other ones possible, some of them are hard to test (e.g. if a pointer we dereference points to a non-null but invalid address).
  • even if we test the right things, if some data in memory is corrupted, we will likely crash somewhere else.

Let me know if this helps.

@calvaris
Copy link
Member

calvaris commented Jul 3, 2023

@varumugam123 @Gavriliuk please, test Guillaume's patch and let us know if helps. For now, I'm closing the issue.

@calvaris calvaris closed this as completed Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants