Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rizin disassembler confused by function which re-uses frame pointer (EBP) [verified for x86-32-bit PE/Windows executable] #4608

Open
aktau opened this issue Sep 4, 2024 · 2 comments

Comments

@aktau
Copy link

aktau commented Sep 4, 2024

Work environment

cc @XVilka from IRC

I'm using https://github.com/rizinorg/cutter/releases/tag/v2.3.4 (AppImage) to work

Questions Answers
OS/arch/bits (mandatory) Debian Trixie x86-64
File format of the file you reverse (mandatory) PE (Windows)
Architecture/bits of the file (mandatory) x86/32
rizin -v full output, not truncated (mandatory) https://github.com/rizinorg/cutter/releases/tag/v2.3.4 (AppImage)

Notes:

  • I'm a beginner at using Rizin, I don't exclude that I made a mistake. I'm using Cutter because it's more intuitive than the Rizin text interface, and I only have limited vacation time to try and RE anything at all.
  • The application in question is a Windows binary from around the year 2000, likely compiled by MSVC. It's meant to extract archive files (called SDF files) for a specific game (Ground Control).

Expected behavior

I expected Rizin/Cutter to resolve references to the stack and non-stack correctly, and not confuse the two, or assign multiple variables for the same address.

Actual behavior

The function that starts at 0x004014a0 re-uses the frame pointer (mov ebp, ecx). We can see by the fact that it uses ecx directly, that this is very likely a MSVC thiscall style function.

Important: I used the edit function option to change the calling convention to cdecl-thiscall-ms.

$ objdump -M intel -d SDFextract.exe --start-address=0x004014a0 --stop-address=0x004016d6 | head -20
SDFextract.exe:     file format pei-i386

004014a0 <.text+0x4a0>:
  4014a0:       83 ec 18                sub    esp,0x18
  4014a3:       53                      push   ebx
  4014a4:       55                      push   ebp
  4014a5:       56                      push   esi
  4014a6:       57                      push   edi
  4014a7:       8b e9                   mov    ebp,ecx           ; Here!
  4014a9:       e8 72 fe ff ff          call   0x401320
  4014ae:       8b 44 24 2c             mov    eax,DWORD PTR [esp+0x2c]
  4014b2:       33 db                   xor    ebx,ebx
  4014b4:       53                      push   ebx
  4014b5:       53                      push   ebx
  4014b6:       6a 03                   push   0x3
  4014b8:       53                      push   ebx
...

Rizin is confused by this, and thinks references to EBP are references to the stack. One example from this function:

undefined4::OpenFile(LPCSTR lpFileName, LPVOID content, int32_t arg_ch, int32_t arg_10h, int32_t arg_14h, int32_t arg_15h, int32_t arg_18h);
; ...
; var uint32_t var_20h @ stack - 0x20
; arg LPCSTR lpFileName @ stack + 0x4
; arg LPVOID content @ stack + 0x8
; arg int32_t arg_ch @ stack + 0xc
; arg int32_t arg_10h @ stack + 0x10
; arg int32_t arg_14h @ stack + 0x14
; ...
0x00401616      call    dword [CreateFileMappingA] ; 0x411010 ; calls a subroutine, push eip into the stack (esp)
0x0040161c      mov     esi, eax                  
0x0040161e      cmp     esi, ebx                  
0x00401620      jne     0x401628                  
0x00401622      mov     byte [arg_14h], 1         ; HERE
0x00401626      jmp     0x401657                  

I was confused a lot by this, until I saw the real objdump:

$ objdump -M intel -d SDFextract.exe --start-address=0x004014a0 --stop-address=0x004016d6
...
  401616:       ff 15 10 10 41 00       call   DWORD PTR ds:0x411010
  40161c:       8b f0                   mov    esi,eax
  40161e:       3b f3                   cmp    esi,ebx
  401620:       75 06                   jne    0x401628
  401622:       c6 45 14 01             mov    BYTE PTR [ebp+0x14],0x1 ; HERE
  401626:       eb 2f                   jmp    0x401657

This is just the latest one I encountered in this function before throwing my hands up and filing an issue. There's more stuff.

Steps to reproduce the behavior

I think I used the normal analysis (aaa) and pointer depth = 5. Then just navigate to the function. I've been reversing this thing for a week now so I don't exactly remember. Perhaps the rzdb contains this information?

@aktau
Copy link
Author

aktau commented Sep 5, 2024

It's not just EBP references either, here's a pure ESP example for the same function. Sometimes rizin gets it right, like here.

Dissasembly (good):

0x00401a5b      lea     ecx, [upperFileName]      ; load effective address
0x00401a62      push    ecx                       ; push word, doubleword or quadword onto the stack; LPCSTR lpFileName
0x00401a63      call    win32_private_toupper     ; win32_private_toupper ; calls a subroutine, push eip into the stack (esp) ; void win32_private_toupper(LPCSTR lpFileName)
0x00401a68      mov     cl, byte [upperFileName]  ; moves data from src to dst
0x00401a6f      add     esp, 4                    ; adds src and dst, stores result on dst

Raw (objdump):

  401a5b:       8d 8c 24 b4 00 00 00    lea    ecx,[esp+0xb4] ; &upperFileName[0]
  401a62:       51                      push   ecx
  401a63:       e8 d7 6d 00 00          call   0x40883f
  401a68:       8a 8c 24 b8 00 00 00    mov    cl,BYTE PTR [esp+0xb8] ; &upperFileName[0] again (stack isn't restored).
  401a6f:       83 c4 04                add    esp,0x4

In the case above, Rizin correctly detected that win32_private_toupper has not restored the stack, so it's offset by +0x4 versus the previous reference.

But sometimes it's wrong. Here's an example close to the start of some function. Rizin dissassembly, from the start until the instruction with the "wrong" variable, where I'd expect to see upperfileName but instead see arg_a8h:

undefined4::MoreReadFile(LPCSTR lpFileName, int32_t arg_8h, int32_t arg_a8h);
; var int32_t var_2dch @ stack - 0x2dc
; ...
; var LPCSTR upperFileName @ stack - 0x200
; ...
; var LPVOID var_4h @ stack - 0x4
; arg LPCSTR lpFileName @ stack + 0x4
; arg char **outFileName @ stack + 0x8 // Aside: why is `arg_8h` not updated? It should be the same thing, I renamed/retyped it.
; arg int32_t arg_a8h @ stack + 0xa8
0x00401a20      sub     esp, 0x2a4            
0x00401a26      push    ebx                      
0x00401a27      push    ebp                 
0x00401a28      push    esi                 
0x00401a29      mov     ebx, ecx               
0x00401a2b      push    edi                  
0x00401a2c      mov     edi, dword [lpFileName] 
0x00401a33      or      ecx, 0xffffffff          
0x00401a36      xor     eax, eax              
0x00401a38      repne   scasb al, byte es:[edi]  
0x00401a3a      not     ecx                    ; uVar9 = strlen(argv_4h). This reverse loop is a very strange way to write strlen, but I verified that it matches.
0x00401a3c      sub     edi, ecx                 
0x00401a3e      lea     edx, [upperFileName]      
0x00401a45      mov     eax, ecx              
0x00401a47      mov     esi, edi                  
0x00401a49      mov     edi, edx           
0x00401a4b      mov     dword [var_298h], ebx  
0x00401a4f      shr     ecx, 2                  
0x00401a52      rep     movsd dword es:[edi], dword ptr [esi] 
0x00401a54      mov     ecx, eax                
0x00401a56      and     ecx, 3                   
0x00401a59      rep     movsb byte es:[edi], byte ptr [esi] ;
0x00401a5b      lea     ecx, [upperFileName]     
0x00401a62      push    ecx                       ;  LPCSTR lpFileName
0x00401a63      call    win32_private_toupper     ;  void win32_private_toupper(LPCSTR lpFileName)
0x00401a68      mov     cl, byte [upperFileName]  
0x00401a6f      add     esp, 4                    
0x00401a72      xor     edi, edi                  
0x00401a74      xor     eax, eax               
0x00401a76      xor     edx, edx           
0x00401a78      mov     dword [var_2a4h], eax  
0x00401a7c      test    cl, cl                    
0x00401a7e      je      0x401aea                
0x00401a80      lea     ecx, [arg_a8h]                    ; Rizin thinks this is a new variable (arg_a8h), but in reality it's upperFileName

So I resorted to objdump, and manually annotated all stack-manipulating instruction that I could find with the current position. The last LEA should then be: stack-0x2b4+0xb4 = stack-0x200, which is the position of upperFileName.

; undefined4::MoreReadFile
; calling convention: thiscall-ms-cdecl
; arguments:
;  stack+0x4 = (const char *) lpFileName
;  stack+0x8 = (char **) outFilename
;  ecx = undefined4 (this is a thiscall, the this pointer is stored in ecx)
00401a20 <.text+0xa20>:
  401a20:       81 ec a4 02 00 00       sub    esp,0x2a4 ; esp = stack-0x2a4
  401a26:       53                      push   ebx       ; esp = stack-0x2a8
  401a27:       55                      push   ebp       ; esp = stack-0x2ac
  401a28:       56                      push   esi       ; esp = stack-0x2b0
  401a29:       8b d9                   mov    ebx,ecx   ; Reuse ebx...
  401a2b:       57                      push   edi       ; esp = stack-0x2b4
  401a2c:       8b bc 24 b8 02 00 00    mov    edi,DWORD PTR [esp+0x2b8] ; lpFileName
  401a33:       83 c9 ff                or     ecx,0xffffffff
  401a36:       33 c0                   xor    eax,eax
  401a38:       f2 ae                   repnz scas al,BYTE PTR es:[edi]
  401a3a:       f7 d1                   not    ecx
  401a3c:       2b f9                   sub    edi,ecx
  401a3e:       8d 94 24 b4 00 00 00    lea    edx,[esp+0xb4] ; esp+0x0b4 = stack-0x200, &upperFileName[0]
  401a45:       8b c1                   mov    eax,ecx
  401a47:       8b f7                   mov    esi,edi
  401a49:       8b fa                   mov    edi,edx
  401a4b:       89 5c 24 1c             mov    DWORD PTR [esp+0x1c],ebx ; esp+0x1c = stack-0x298; this
  401a4f:       c1 e9 02                shr    ecx,0x2
  401a52:       f3 a5                   rep movs DWORD PTR es:[edi],DWORD PTR ds:[esi]
  401a54:       8b c8                   mov    ecx,eax
  401a56:       83 e1 03                and    ecx,0x3
  401a59:       f3 a4                   rep movs BYTE PTR es:[edi],BYTE PTR ds:[esi]
  401a5b:       8d 8c 24 b4 00 00 00    lea    ecx,[esp+0xb4] ; esp = stack-0x2b4 = stack-0x200, &upperFileName[0]
  401a62:       51                      push   ecx            ; esp = stack-0x2b8
  401a63:       e8 d7 6d 00 00          call   0x40883f       ; win32_private_toupper
  401a68:       8a 8c 24 b8 00 00 00    mov    cl,BYTE PTR [esp+0xb8] ; stack-0x2b8+0xb8 = stack-0x200, upperFileName[0]
  401a6f:       83 c4 04                add    esp,0x4        ; esp = stack-0x2b4
  401a72:       33 ff                   xor    edi,edi
  401a74:       33 c0                   xor    eax,eax
  401a76:       33 d2                   xor    edx,edx
  401a78:       89 44 24 10             mov    DWORD PTR [esp+0x10],eax ; stack-0x2b4+0x10 = stack-0x2a4 = 0x0
  401a7c:       84 c9                   test   cl,cl
  401a7e:       74 6a                   je     0x401aea
  401a80:       8d 8c 24 b4 00 00 00    lea    ecx,[esp+0xb4] ; stack-0x2b4+0xb4 = stack-0x200 = &upperFileName[0]

@aktau
Copy link
Author

aktau commented Sep 5, 2024

Given the above, it seems like both ESP and EBP based references can be wrong. Should I pull out the latter case into a separate bug? Something tells me the problems are related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants