-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PE Module not workign correctly when scanning processes #1372
Comments
Sorry for the delayed response. I didn't managed to spend time in reviewing this issue until now. The work you had done is very interesting and I think it can be very valuable. Honestly this memory scan feature has been a bit neglected for a long time, so is nice to have someone who looks more carefully into it. I would like to start by understanding the issue in depth. My understanding is that currently the PE module is assuming that all the data it needs is contained in single memory block (the memory block where the PE header resides). Those memory blocks are memory regions reported by If that's correct it brings me a question. Why exactly those regions (the one containing the PE header, and the one containing the import data) are separated? Do they have some unallocated region in between? Do they have different access privileges? Or they are both contiguous and also with the same privileges, but the loader decided to allocate them as two different blocks and therefore |
I also like your approach of implementing a master...niallnsec:master#diff-814b41c5dc1a8103cab442fcb7e9647b02e4ba911d719adcd4fe78bedd8d640eR312 It looks like you are getting the base address from some undocumented field in the structure returned by |
No worries, I have been using YARA for a few years now and am very appreciative of the work yourself and others have put into the project. My own project actually got put on hold which is why I have not revisited this since October. You are correct in your understanding. To elaborate a bit, there are two assumptions the PE module is implicitly making with respect to memory scanning which do not hold true:
Both issues are caused by the actions of the Windows loader when an image is mapped into memory (the hallmark or which is the SEC_IMAGE memory section type). The loader parses the headers and then creates a base allocation for the image file based on the size of the headers plus the virtual sizes of each section, padded so each is aligned according to IMAGE_NT_HEADERS.OptionalHeader.SectionAlignment. Within this base allocation it first allocates a region large enough to hold the PE headers using the size stored in IMAGE_NT_HEADERS.OptionalHeader.SizeOfHeaders padded to be aligned according to the files SectionAlignment. These are only used during the load process and may be scrubbed once the loader is done, although this is usually only seen in the case of malware. Next, the loader iterates over the section headers and allocates a sub region for each section sequentially. The size and protection values for each region is taken from the corresponding section header. The size of each region is also aligned according to the files SectionAlignment value. (In practice there are optimisations in place in Windows that may change the protection value of various sections for improved system performance. Regions with the same protection value within a single base allocation are treated as a single region when querying with VirtualQueryEx. The most common optimisations observed are the use of WRITE_COPY memory instead or READ_WRITE for one or more parts of a section and the re-protection of certain READ_WRITE sections to READ_ONLY such as the common .didata section.) Once all the sections are in place, the loader will parse the data directories. The data directories structure is an array of objects which store pointers to data and the size of the associated data blobs. The data blobs can be located at any virtual address and will almost always be outside of the PE header region. In the case of the Imports directory, there are multiple levels of nesting of data pointers, starting with the thunk objects describing the imported DLLs. Although the imports are typically contiguous, it is possible for each data pointer to reference any valid virtual address within the image. The same holds true for the other data directories, which means the region which contains the data could belong to any of the PE files sections. This means that in order to reliably parse the data, YARA needs to be able to reference all of the regions that make up the PE file, much in the same way it does when the PE module is operating on files on disk (I believe). The approach I took to making the data available was to allocate a block of memory which is big enough to hold the entire base allocation as reported by VirtualQueryEx, and then iterate forward copying each region until the value for AllocationBase reported by VirtualQueryEx changes. This has the drawback of potentially causing YARA to use a large region of memory, although the maximum size should be reasonably bounded. The advantage is that all of the RVAs are now offsets into the memory block that has been copied and so can be easily referenced with a bit of pointer arithmetic. As well as the splitting of data into discrete allocations based on the sections, there are also a couple of other notable changes caused by the loader:
Regarding the yr_process_fetch_primary_module_base function: Things are complicated by the fact that 64 bit windows supports running 32 bit applications using WOW64. When a process is running in WOW64 mode, it will have two distinct PEB structures, one for 64 bit and one for 32 bit. The ImageBaseAddress pointer we require is in the 32 bit PEB but the address returned in the PebBaseAddress field of PROCESS_BASIC_INFORMATION will be a pointer to the 64 bit PEB. In order to overcome this, we can use the ProcessWow64Information class in the call to NtQueryInformationProcess. Although the official documentation states that the value returned by using ProcessWow64Information is a BOOLEAN, in reality it is a pointer to the processes 32 bit PEB, if it is present. So far, to the best of my knowledge, this has remained true in all versions of Windows which support WOW64. There are 5 possible scanning situations depending on the architectures of the system, YARA and the target process for which the function must account:
If we manage to retrieve a valid PEB address, we can read the process memory at the appropriate offset to get the ImageBaseAddress pointer, then it can be used to immediately jump to the correct location for scanning. Otherwise the function returns NULL forcing YARA to default back to the slower brute force approach. |
Thank you for the very detailed explanation, everything is more clear to me now. It looks like you know a lot about the internal working of the Windows loader, so if you have the time and willingness to collaborate in this area, your help is going to be really appreciated. I think I'm going to merge your commit for version 4.2 (4.1 is scheduled to be released soon and I prefer not introducing large changes at this point). In the meanwhile I'll try to familiarize myself with your changes and will probably ask questions during the process. |
I'd be more than happy to collaborate, analysing PE features of memory images is something I am very interested in. I have been working on a project surrounding realtime/forensic analysis of Windows memory and I think the memory scan feature of YARA is incredibly useful. Regarding the code I was working on, I have been re-familiarising myself with it and there was one bit in particular which I thought seemed fragile and a bit hacky, the function yr_process_fetch_memory_region_data. It sort of hijacks the underlying data fetching logic of YARA and uses structures from untyped pointers to do its work. As a result it is dependent on the behaviour of the existing data loading functions, so any future changes in that part of the code base could break this function in unexpected ways. Having said that, I struggled to find a better way of doing it without creating a mess in other code files. I did think about loading the data separately without using the YR_MEMORY_BLOCK structure at all but that seemed worse to me than the code I came up with initially. |
Hello, Sorry for digging out this thread. I came across the same problem recently and I am very interested in a potential integration of pe module for running Windows processes. As you mentionned it could part of a further release, I wondered if you could you provide any update on the matter ? Thanks. |
I see there is an RC for version 4.2 of Yara, is there anything I can do to help with getting this change included in the next release? |
We can start working on merging this into master. I wouldn't release it in version 4.2.0, as the first release candidate is already out and I'm only introducing bug fixes and minor changes, but for it could be released in 4.3. @niallnsec Could you please send a pull request with your latest changes? |
@plusvic I have created a PR with the changes from my fork, although it builds and runs on my Windows machine it appears there are a few issues with the CI build which I haven't had a chance to address yet. |
It's a a linking problem:
You added |
I've fixed the VS project and that is building now, but I cannot figure out how to get the cygwin test projects to build. I tried quite a few different things but could not get it to link the test files with ntdll. |
Hello,
It seems that the PE module has a few issues when scanning running processes on Windows. When an executable is mapped into memory, its allocation is split into a number of sub allocations with different protections. It appears that the module is only taking into account the first sub allocation, which is typically (possibly always) a 4K region contain the PE headers.
Due to the fact that the headers are in this first region a lot of the functionality works most of the time, however because the windows loader is a bit more flexible than the PE specification, this will be inconsistent.
More commonly, problems manifest in the import table parsing because the imports data is usually stored in a section outside of the headers block. Since the module does not pull this data in, it fails because it determines the offsets to be out of bounds.
Since the imports, and potentially other items, can potentially be spread all over the PE file, I believe it is necessary to pull the full allocation into the buffer before analysing.
Parsing of imports from running processes is something I am in need of for a project, so I had a go at implementing the functionality. I managed to get it working with a limited number of test cases but I haven't had time to fully test yet. Also I crossed a couple of code boundaries that seem to be in place in the source so I think my approach is likely a bit too much of a hack. It may be useful though, since logically it does work. The fork is at https://github.com/niallnsec/yara Its only an evenings work so I appreciate it is a bit rough.
I took the approach of fetching all committed regions that shared the same allocation base as the region containing the headers for the base image. That way I can be sure all of the data has been retrieved. This has the downside of requiring potentially large amounts of memory for very large binaries. I think the alternative would involve on demand retrieval of sub allocation data. I placed the data sequentially into a buffer to simplify access, but this may cause bounds checking issues if a reserved region exists in the allocation (although I am not sure that would ever happen).
I also added a routine to more quickly find the base address of the primary module to avoid iterating through all the process regions since ReadProcessMemory is a pretty expensive function.
Thanks
The text was updated successfully, but these errors were encountered: