Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PE Module not workign correctly when scanning processes #1372

Open
niallnsec opened this issue Oct 8, 2020 · 11 comments
Open

PE Module not workign correctly when scanning processes #1372

niallnsec opened this issue Oct 8, 2020 · 11 comments
Milestone

Comments

@niallnsec
Copy link

Hello,

It seems that the PE module has a few issues when scanning running processes on Windows. When an executable is mapped into memory, its allocation is split into a number of sub allocations with different protections. It appears that the module is only taking into account the first sub allocation, which is typically (possibly always) a 4K region contain the PE headers.

Due to the fact that the headers are in this first region a lot of the functionality works most of the time, however because the windows loader is a bit more flexible than the PE specification, this will be inconsistent.

More commonly, problems manifest in the import table parsing because the imports data is usually stored in a section outside of the headers block. Since the module does not pull this data in, it fails because it determines the offsets to be out of bounds.

Since the imports, and potentially other items, can potentially be spread all over the PE file, I believe it is necessary to pull the full allocation into the buffer before analysing.

Parsing of imports from running processes is something I am in need of for a project, so I had a go at implementing the functionality. I managed to get it working with a limited number of test cases but I haven't had time to fully test yet. Also I crossed a couple of code boundaries that seem to be in place in the source so I think my approach is likely a bit too much of a hack. It may be useful though, since logically it does work. The fork is at https://github.com/niallnsec/yara Its only an evenings work so I appreciate it is a bit rough.

I took the approach of fetching all committed regions that shared the same allocation base as the region containing the headers for the base image. That way I can be sure all of the data has been retrieved. This has the downside of requiring potentially large amounts of memory for very large binaries. I think the alternative would involve on demand retrieval of sub allocation data. I placed the data sequentially into a buffer to simplify access, but this may cause bounds checking issues if a reserved region exists in the allocation (although I am not sure that would ever happen).

I also added a routine to more quickly find the base address of the primary module to avoid iterating through all the process regions since ReadProcessMemory is a pretty expensive function.

Thanks

@plusvic
Copy link
Member

plusvic commented Jan 15, 2021

Sorry for the delayed response. I didn't managed to spend time in reviewing this issue until now. The work you had done is very interesting and I think it can be very valuable. Honestly this memory scan feature has been a bit neglected for a long time, so is nice to have someone who looks more carefully into it.

I would like to start by understanding the issue in depth. My understanding is that currently the PE module is assuming that all the data it needs is contained in single memory block (the memory block where the PE header resides). Those memory blocks are memory regions reported by VirtualQueryEx as continuous memory regions with the same access privileges. However there are situations in which import data is not in the same memory region that the PE header, and this of course makes YARA fail. Is that correct?

If that's correct it brings me a question. Why exactly those regions (the one containing the PE header, and the one containing the import data) are separated? Do they have some unallocated region in between? Do they have different access privileges? Or they are both contiguous and also with the same privileges, but the loader decided to allocate them as two different blocks and therefore VirtualQueryEx treats them as different regions?

@plusvic
Copy link
Member

plusvic commented Jan 15, 2021

I also like your approach of implementing a yr_process_fetch_primary_module_base for finding the base address for the main module. Currently YARA enumerates the memory blocks and it considers that the first block in which it finds a PE header that's the main module, and that's no necessarily true, I always thought that it was fragile assumption. Would you explain me the inner working of that function? I'm particularly intrigued about this:

master...niallnsec:master#diff-814b41c5dc1a8103cab442fcb7e9647b02e4ba911d719adcd4fe78bedd8d640eR312

It looks like you are getting the base address from some undocumented field in the structure returned by NtQueryInformationProcess with ProcessWow64Information? Is that correct?

@niallnsec
Copy link
Author

No worries, I have been using YARA for a few years now and am very appreciative of the work yourself and others have put into the project. My own project actually got put on hold which is why I have not revisited this since October.

You are correct in your understanding. To elaborate a bit, there are two assumptions the PE module is implicitly making with respect to memory scanning which do not hold true:

  1. The structure of a PE file in memory is the same as on disk
  2. All of the data needed for parsing the headers is contained in the first allocated region of memory

Both issues are caused by the actions of the Windows loader when an image is mapped into memory (the hallmark or which is the SEC_IMAGE memory section type). The loader parses the headers and then creates a base allocation for the image file based on the size of the headers plus the virtual sizes of each section, padded so each is aligned according to IMAGE_NT_HEADERS.OptionalHeader.SectionAlignment.

Within this base allocation it first allocates a region large enough to hold the PE headers using the size stored in IMAGE_NT_HEADERS.OptionalHeader.SizeOfHeaders padded to be aligned according to the files SectionAlignment. These are only used during the load process and may be scrubbed once the loader is done, although this is usually only seen in the case of malware. Next, the loader iterates over the section headers and allocates a sub region for each section sequentially. The size and protection values for each region is taken from the corresponding section header. The size of each region is also aligned according to the files SectionAlignment value.

(In practice there are optimisations in place in Windows that may change the protection value of various sections for improved system performance. Regions with the same protection value within a single base allocation are treated as a single region when querying with VirtualQueryEx. The most common optimisations observed are the use of WRITE_COPY memory instead or READ_WRITE for one or more parts of a section and the re-protection of certain READ_WRITE sections to READ_ONLY such as the common .didata section.)

Once all the sections are in place, the loader will parse the data directories. The data directories structure is an array of objects which store pointers to data and the size of the associated data blobs. The data blobs can be located at any virtual address and will almost always be outside of the PE header region.

In the case of the Imports directory, there are multiple levels of nesting of data pointers, starting with the thunk objects describing the imported DLLs. Although the imports are typically contiguous, it is possible for each data pointer to reference any valid virtual address within the image. The same holds true for the other data directories, which means the region which contains the data could belong to any of the PE files sections.

This means that in order to reliably parse the data, YARA needs to be able to reference all of the regions that make up the PE file, much in the same way it does when the PE module is operating on files on disk (I believe).

The approach I took to making the data available was to allocate a block of memory which is big enough to hold the entire base allocation as reported by VirtualQueryEx, and then iterate forward copying each region until the value for AllocationBase reported by VirtualQueryEx changes. This has the drawback of potentially causing YARA to use a large region of memory, although the maximum size should be reasonably bounded.

The advantage is that all of the RVAs are now offsets into the memory block that has been copied and so can be easily referenced with a bit of pointer arithmetic.

As well as the splitting of data into discrete allocations based on the sections, there are also a couple of other notable changes caused by the loader:

  1. The RVA addresses in the headers and associated data structures should now be considered raw offsets from the image base
  2. The PE overlay is never loaded into memory, this usually means that and code signing certificate data will be missing, making it impossible to parse the Security directory
  3. Some sections will not be loaded into memory, notably if their VirtualAddress or VirtualSize is zero or if they are marked with the Discard section characteristic.
  4. Relocations are applied where necessary
  5. Some of the information used when calculating the PE checksum is missing or altered, meaning you will get an invalid result when attempting to verify the value

Regarding the yr_process_fetch_primary_module_base function:
The primary purpose of the function is to locate the ImageBaseAddress field in the processes PEB which stores a pointer to the first allocated region of the PE file being executed. Although the PEB is undocumented, the ImageBaseAddress field has remained in the same place for over 20 years, so it is a fairly safe bet it wont change any time soon. The offset for this field from the start of the PEB is 0x08 for 32-bit windows and 0x10 for 64-bit Windows.

Things are complicated by the fact that 64 bit windows supports running 32 bit applications using WOW64. When a process is running in WOW64 mode, it will have two distinct PEB structures, one for 64 bit and one for 32 bit. The ImageBaseAddress pointer we require is in the 32 bit PEB but the address returned in the PebBaseAddress field of PROCESS_BASIC_INFORMATION will be a pointer to the 64 bit PEB. In order to overcome this, we can use the ProcessWow64Information class in the call to NtQueryInformationProcess.

Although the official documentation states that the value returned by using ProcessWow64Information is a BOOLEAN, in reality it is a pointer to the processes 32 bit PEB, if it is present. So far, to the best of my knowledge, this has remained true in all versions of Windows which support WOW64.

There are 5 possible scanning situations depending on the architectures of the system, YARA and the target process for which the function must account:

  1. On a 32-bit system:

    • 32 bit YARA scanning 32 bit process - We can use ProcessBasicInformation to get the address of PEB.
  2. On a 64-bit system:

    • 32 bit YARA scanning 32 bit process - We can use ProcessBasicInformation to get the pointer to the 32 bit PEB. This pointer is guaranteed to be in memory lower than 4GB so we do not have to worry about truncation.
    • 32 bit YARA scanning 64 bit process - We cannot reliably get the PEB pointer because it may be in memory greater than 4GB resulting in a truncated pointer being returned. The only way to scan in this manner would be to use the techniques implemented in this project: https://github.com/rwfpl/rewolf-wow64ext
    • 64 bit YARA scanning 64 bit process - We can use ProcessBasicInformation to get the address of the PEB.
    • 64 bit YARA scanning 32 bit process - We can use ProcessWow64Information to get the pointer to the 32 bit PEB.

If we manage to retrieve a valid PEB address, we can read the process memory at the appropriate offset to get the ImageBaseAddress pointer, then it can be used to immediately jump to the correct location for scanning. Otherwise the function returns NULL forcing YARA to default back to the slower brute force approach.

@plusvic
Copy link
Member

plusvic commented Jan 18, 2021

Thank you for the very detailed explanation, everything is more clear to me now. It looks like you know a lot about the internal working of the Windows loader, so if you have the time and willingness to collaborate in this area, your help is going to be really appreciated.

I think I'm going to merge your commit for version 4.2 (4.1 is scheduled to be released soon and I prefer not introducing large changes at this point). In the meanwhile I'll try to familiarize myself with your changes and will probably ask questions during the process.

@niallnsec
Copy link
Author

I'd be more than happy to collaborate, analysing PE features of memory images is something I am very interested in. I have been working on a project surrounding realtime/forensic analysis of Windows memory and I think the memory scan feature of YARA is incredibly useful.

Regarding the code I was working on, I have been re-familiarising myself with it and there was one bit in particular which I thought seemed fragile and a bit hacky, the function yr_process_fetch_memory_region_data. It sort of hijacks the underlying data fetching logic of YARA and uses structures from untyped pointers to do its work. As a result it is dependent on the behaviour of the existing data loading functions, so any future changes in that part of the code base could break this function in unexpected ways.

Having said that, I struggled to find a better way of doing it without creating a mess in other code files. I did think about loading the data separately without using the YR_MEMORY_BLOCK structure at all but that seemed worse to me than the code I came up with initially.

@plusvic plusvic added this to the v4.2 milestone Mar 12, 2021
@Landernal
Copy link

Hello,

Sorry for digging out this thread. I came across the same problem recently and I am very interested in a potential integration of pe module for running Windows processes. As you mentionned it could part of a further release, I wondered if you could you provide any update on the matter ?

Thanks.

@niallnsec
Copy link
Author

I see there is an RC for version 4.2 of Yara, is there anything I can do to help with getting this change included in the next release?

@plusvic
Copy link
Member

plusvic commented Feb 23, 2022

We can start working on merging this into master. I wouldn't release it in version 4.2.0, as the first release candidate is already out and I'm only introducing bug fixes and minor changes, but for it could be released in 4.3.

@niallnsec Could you please send a pull request with your latest changes?

@niallnsec
Copy link
Author

@plusvic I have created a PR with the changes from my fork, although it builds and runs on my Windows machine it appears there are a few issues with the CI build which I haven't had a chance to address yet.

@plusvic
Copy link
Member

plusvic commented Feb 25, 2022

It's a a linking problem:

libyara64.lib(windows.obj) : error LNK2019: unresolved external symbol NtQueryInformationProcess referenced in function yr_process_fetch_primary_module_base [C:\projects\yara\windows\vs2015\yara\yara.vcxproj]
C:\projects\yara\windows\vs2015\Debug\yara64.exe : fatal error LNK1120: 1 unresolved externals [C:\projects\yara\windows\vs2015\yara\yara.vcxproj]

You added ntdll.lib to the VS2017 project, but not to VS2015, which is the one used in CI.

@niallnsec
Copy link
Author

I've fixed the VS project and that is building now, but I cannot figure out how to get the cygwin test projects to build. I tried quite a few different things but could not get it to link the test files with ntdll.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants