Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRASH (unit_tests new alloc memset to zero on unwritable page) #1743

Open
derekbruening opened this issue Jul 30, 2015 · 2 comments
Open

CRASH (unit_tests new alloc memset to zero on unwritable page) #1743

derekbruening opened this issue Jul 30, 2015 · 2 comments

Comments

@derekbruening
Copy link
Contributor

Hit on Chromium bots: https://code.google.com/p/chromium/issues/detail?id=514921

Crash info:

[----------] 1 test from TaskManagerTest
[ RUN      ] TaskManagerTest.Resources
<Application E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\out\Release\unit_tests.exe (2604).  Dr. Memory internal crash at PC 0x7384b5e1.  Please report this at http://drmemory.org/issues.  Program aborted.
0xc0000005 0x00000000 0x7384b5e1 0x7384b5e1 0x00000001 0x54130000
Base: 0x66510000
Registers: eax=0x00000004 ebx=0x00000000 ecx=0x54130000 edx=0x00000060
    esi=0x0001ffe8 edi=0x54110040 esp=0x236aef10 ebp=0x236aef5c
    eflags=0x000
1.8.16561-105909314-(May  6 2015 08:37:57) win61
-no_dynamic_options -disasm_mask 8 -logdir 'C:\Users\chrome-bot\AppData\LocalLow\vg_logs_zz8m5k\dynamorio' -client_lib 'E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release\drmemorylib.dll;0;-suppress `E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\tools\valgr
0x236aef5c 0x7382f8fb
0x236aefd8 0x665d914b
0x003ae854 0x003ae9e4
0x23669ac0 0xfffddfca
E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release\drmemorylib.dll=0x73800000
E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release/dbghelp.dll=0x663c0000
C:\Windows/system32/msvcrt.dll=0x06d20000
C:\Windows/system32/KERNEL32.dll=0x06e40000
C:\Windows/system32/KERNELBASE.dll=0x06920000>
~~Dr.M~~ WARNING: application exited with abnormal code 0xffffffff

Also crashed on TaskManagerTest.RefreshCalled:

[----------] 1 test from TaskManagerTest
[ RUN      ] TaskManagerTest.RefreshCalled
<Application E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\out\Release\unit_tests.exe (3424).  Dr. Memory internal crash at PC 0x06dfb5e1.  Please report this at http://drmemory.org/issues.  Program aborted.
0xc0000005 0x00000000 0x06dfb5e1 0x06dfb5e1 0x00000001 0x543e0000
Base: 0x66830000
Registers: eax=0x00000004 ebx=0x00000000 ecx=0x543e0000 edx=0x00000060
    esi=0x0001ffe8 edi=0x543c0040 esp=0x1830ef10 ebp=0x1830ef5c
    eflags=0x000
1.8.16561-105909314-(May  6 2015 08:37:57) win61
-no_dynamic_options -disasm_mask 8 -logdir 'C:\Users\chrome-bot\AppData\LocalLow\vg_logs_wnt0dv\dynamorio' -client_lib 'E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release\drmemorylib.dll;0;-suppress `E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\tools\valgr
0x1830ef5c 0x06ddf8fb
0x1830efd8 0x668f914b
0x0030e9d4 0x0030eb64
0x182c7740 0xfffddfca
E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release\drmemorylib.dll=0x06db0000
E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release/dbghelp.dll=0x666e0000
C:\Windows/system32/msvcrt.dll=0x06c70000
C:\Windows/system32/KERNEL32.dll=0x06f90000
C:\Windows/system32/KERNELBASE.dll=0x06790000>
~~Dr.M~~ WARNING: application exited with abnormal code 0xffffffff

These tests themselves are not new and have not been changed recently.

This crash is non-deterministic: went away and came back on bot #3, where builds 7377-73780 are green (and the other 2 bots that shard unit_tests don't have the crash then).

This bot has been purple a lot. I actually see this crash further back:
builds 7330 and 7332 on bot #3 from July 23. The crashes could go back
even further than that.

Logging in to the bot and running just this test or all 3 TaskManagerTest.* using the same args as the scripts: the tests run just fine with no crash.


Is it symbol cache corruption (#1465) which can cause weird crashes? On the bot in the AppData/LocalLow/drmemory.symcache directory:
$ grep 157db *
msvcrt.dll.txt:_CrtDbgReport,0x157db
msvcrt.dll.txt:_CrtDbgReportW,0x157db
msvcrt.dll.txt:_CrtDbgReportV,0x157db
msvcrt.dll.txt:_CrtDbgReportWV,0x157db
msvcrt.dll.txt:_CrtSetDbgFlag,0x157db
msvcrt.dll.txt:_crtDbgFlag,0x157db

So doesn't look like it. I made a copy of the original symcache dir on the bot (bug_514921/) and cleared out the old one just in case.


Symbolizing the crash call stack:

$ bin/symquery.exe -e bin/release/drmemorylib.dll -f -a 0x4b5e1 0x2f8fb
_VEC_memzero+0x60
??:0
replace_realloc_common+0x29b
e:\b\build\slave\win-builder\drmemory\common\alloc_replace.c:2008+0x0
$ bin/symquery.exe -e dynamorio/lib32/release/dynamorio.dll -f -a 0xc914b
dr_call_on_clean_stack+0x60
E:\b\build\slave\win-builder\build\build_drmemory-release-32\dynamorio\core\x86.asm_core.s:1531+0x0

drmemorylib!_VEC_memzero+0x60:
7384b5e1 660f7f01        movdqa  xmmword ptr [ecx],xmm0
7384b5e5 8d4910          lea     ecx,[ecx+10h]
7384b5e8 48              dec     eax
7384b5e9 75f6            jne     drmemorylib!_VEC_memzero+0x60 (7384b5e1)

0:000> U 0x7382f8fb-5
drmemorylib!replace_realloc_common+0x296 [e:\b\build\slave\win-builder\drmemory\common\alloc_replace.c @ 2007]:
7382f8f6 e825f6ffff      call    drmemorylib!replace_alloc_common (7382ef20)

drmemorylib!replace_alloc_common+0x274 [e:\b\build\slave\win-builder\drmemory\common\alloc_replace.c @ 1698]:
 1698 7382f194 8b55f4          mov     edx,dword ptr [ebp-0Ch]
 1698 7382f197 56              push    esi
 1698 7382f198 6a00            push    0
 1698 7382f19a 52              push    edx
 1698 7382f19b e880b60100      call    drmemorylib!memset (7384a820)
 1698 7382f1a0 83c40c          add     esp,0Ch

memset calls _VEC_memzero if size!=0, val=0, size<0x80, and sse2 is avail:

0:000> Uf drmemorylib!memset
drmemorylib!memset [f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\memset.asm @ 59]:
   59 7384a820 8b54240c        mov     edx,dword ptr [esp+0Ch]
   69 7384a824 8b4c2404        mov     ecx,dword ptr [esp+4]
   71 7384a828 85d2            test    edx,edx
   72 7384a82a 7469            je      drmemorylib!memset+0x75 (7384a895)
drmemorylib!memset+0xc [f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\memset.asm @ 74]:
   74 7384a82c 33c0            xor     eax,eax
   75 7384a82e 8a442408        mov     al,byte ptr [esp+8]
   78 7384a832 84c0            test    al,al
   79 7384a834 7516            jne     drmemorylib!memset+0x2c (7384a84c)
drmemorylib!memset+0x16 [f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\memset.asm @ 80]:
   80 7384a836 81fa80000000    cmp     edx,80h
   81 7384a83c 720e            jb      drmemorylib!memset+0x2c (7384a84c)
drmemorylib!memset+0x1e [f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\memset.asm @ 82]:
   82 7384a83e 833d94ed9c7300  cmp     dword ptr [drmemorylib!__sse2_available (739ced94)],0
   83 7384a845 7405            je      drmemorylib!memset+0x2c (7384a84c)
drmemorylib!memset+0x27 [f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\memset.asm @ 85]:
   85 7384a847 e9350d0000      jmp     drmemorylib!_VEC_memzero (7384b581)

Passes ptr in ecx, size in edx.

So it's this line:

    if (TEST(ALLOC_ZERO, flags))
        memset(res, 0, request_size);

Failed to write to a new page (0x00000001 0x54130000). Still has eax=4 *
16 = 64 bytes (+ more if non-16-aligned) left to zero.
Original size edx=0x60? but that's <0x80. edx could be modified before
crash if orig ptr not aligned to 16.

Very strange: if the allocator really messes up this badly and has an unwritable page in the middle of a new alloc, wouldn't we see a lot more problems? Is there really some crazy free list corner case that's this rare? We haven't updated DrMem in a while so there was some change in Cr that changed its alloc pattern to suddenly trigger this weird bug?


I'm still trying to reproduce locally: running individual tests in a loop had no success so I am now running this same set of unit_tests subtests sharded in a loop. No repro so far.

@derekbruening
Copy link
Contributor Author

Locally, running the whole shard 10x in a loop => no repro. Running individual tests 28x => no repro. On the bot, running individual tests 20x => no repro.

Lowering priority to medium b/c we can't reproduce it.

@derekbruening
Copy link
Contributor Author

Could DR's cache cons have made this page readonly? Xref #1354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant